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SECRETED PROTEINS AND NUCLEIC ACIDS ENCODING THEM 

Related ADplication Information 
5 This application is a continuation-in-part of 

application serial number 09/164,169, filed October 2, 
1998, which is a continuation-in-part of application 
serial number 09/164,220, filed September 30, 1998. 

Background of the Invention 

10 Many secreted proteins, for example, cytokines and 
cytokine receptors, play a vital role in the regulation 
of cell growth, cell differentiation, and a variety of 
specific cellular responses. A number of medically 
useful proteins, including erythropoietin, granulocyte- 

15 macrophage colony stimulating factor, human growth 

hormone, and various interleukins, are secreted proteins. 
Thus, an important goal in the design and development of 
new therapies is the identification and characterization 
of secreted and transmembrane proteins and the genes 

20 which encode them* 

Meuiy secreted proteins are receptors which bind a 
ligand and transduce an intracellular signal, leading to 
a variety of cellular responses. The identification and 
characterization of such a receptor enables one to 

25 identify both the ligands which bind to the receptor and 
the intracellular molecules and signal transduction 
pathways associated with the receptor, permitting one to 
identify or design modulators of receptor activity, e.g., 
receptor agonists or antagonists and modulators of signal 

30 transduction. 



Summary of the Invention 
The present invention is based, at least in part, on 
the discovery of cDNA molecules encoding TANGO 180, TANGO 
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181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and TANGO 215, all 
of which are predicted to be either wholly secreted or 
transmembrane proteins* These proteins, fragments, 
5 derivatives, and variants thereof are collectively 
referred to as ^'polypeptides of the invention" or 
^'proteins of the invention." Nucleic acid molecules 
encoding polypeptides of the invention are collectively 
referred to as ''nucleic acids of the invention." 

10 The nucleic acids and polypeptides of the present 

invention are useful as modulating agents in regulating a 
variety of cellular processes. Accordingly, in one 
aspect, the present invention provides isolated nucleic 
acid molecules encoding a polypeptide of the invention or 

15 a biologically active portion thereof. The present 

invention also provides nucleic acid molecules which are 
suitable as primers or hybridization probes for the 
detection of nucleic acids encoding a polypeptide of the 
invention. 

20 The invention features nucleic acid molecules which are 
at least 45% (or 55%, 65%, 75%, 85%, 95%, or 98%) 
identical to the nucleotide sequence of any of SEQ ID 

Nos:l-22, 34-43 and - or the nucleotide sequence 

of the dDSA of a clone deposited with ATCC as any of 

25 Accession Numbers 98899, 98900 and 98901 (the "cDNA of a 
clone deposited as any of ATCC 98899, 98900, and 
989001"), or a complement thereof. 

The invention features nucleic acid molecules which 
include a fragment of at least 300 (325, 350, 375, 400, 

30 425, 450, 500, 550, 600, 650, 700, 800, 900, 1000, or 
1200) nucleotides of the nucleotide sequence of any of 

SEQ ID Nos:l-22, 34-43 and - or the nucleotide 

sequence of the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, or a con^lement thereof. 
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The invention also features nucleic acid molecules 
which include a nucleotide sequence encoding a protein 
having an amino acid sequence that is at least 45% (or 
55%, 65%, 75%, 85%, 95%, or 98%) identical to the amino 

5 acid sequence of any of SEQ ID Nos:23-33, 54-63, and - 

or the amino acid sequence encoded by the cDNA of a 

clone deposited as any of ATCC 98899, 98900, and 989001, 
or a complement thereof* 

In preferred embodiments, the nucleic acid molecules 
10 have the nucleotide sequence of any of SEQ ID N08:l-22, 

34-43 and - or the nucleotide sequence of the cWA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001* 

Also within the invention are nucleic acid molecules 

15 which encode a fragment of a polypeptide" having the amino 

acid sequence of any of SEQ ID Nos:23-33-, 54*63, and 

- the fragment including at least 15 (25, 30, 50, 

100, 150, 300, or 400) contiguous amino acids of any of 
SEQ ID Nos:23-33, 54-63, and - or the polypeptide 

20 encoded by the cDNA of a clone deposited , as any of ATCC 
98899, 98900, and 989001* 

The invention includes nucleic acid molecules which 
encode a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

25 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA of a clone deposited as any 
of ATCC 98899, 98900, and 989001, wherein the nucleic 
acid moleciile hybridizes under stringent conditions to a 
nucleic acid molecule having a nucleic acid seqpience 

30 encoding any of SEQ ID NOs:22-33, 54-63, and - , 

or a complement thereof - 

Also within the invention are: isolated polypeptides or 
proteins having an amino acid sequence tl^at is at least 
about 65%, preferably 75%, 85%, 95%, or 98% identical to 
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the amino acid sequence of any of SEQ ID NOs: 22-33, 54- 

63, and - . 

Also within the invention are: isolated polypeptides or 
proteins which are encoded by a nucleic acid molecule 
5 having a nucleotide sequence that is at least about 65%, 
preferably 75%, 85%, or 95% identical the nucleic acid 

sequence encoding any of SEQ ID Nos:22-33, 54-63, and 

- and isolated polypeptides or proteins which are 

encoded by a nucleic acid tnolecule having a nucleotide 

10 sequence which hybridizes under stringent hybridization 
conditions to a nucleic acid molecule having the sequence 

of any of SEQ ID NOs: 1-22, 34-43, and - , and a 

complement thereof or the non- coding strand of the cDMA 
of a clone deposited as any of ATCC 98899, 98900, and 

15 989001. 

Also within the invention are polypeptides which are 
naturally occurring allelic varismts of a polypeptide 
that includes the amino acid sequence of any of SEQ ID 
NOs: 22 -33, 54-63, and - or an amino acid sequence 

20 encoded by the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes under 
stringent conditions to a nucleic acid molecule having 
the sequence of any of SEQ ID NOs:l-22, 34-43, and - 

25 or a complement thereof. 

The invention also feattures nucleic acid molecules that 
hybridize under stringent conditions to a. nucleic acid 
molecule conprising the nucleotide sequence of any of SEQ 
ID NOs: 1-22, 34-43, and - , of the cDNA of a clone 

30 deposited as any of ATCC 98899, 98900, and 989001, or a 
complement thereof. In other embodiments, the nucleic 
acid molecules are at least 300 (325, 350, 375, 400, 425, 
450, 500, 550, 600, 650, 700, 800, 900, 1000, or 1290) 
nucleotides in length and hybridize under' stringent 

35 conditions to a nucleic acid molecule cofinprising the 
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nucleotide sequence of any of SEQ ID NOs:l-22, 34-43, and 

- of the cDNA of a clone deposited as any of ATCC 

98899, 98900, and 989001, or a complement thereof. In 
preferred embodiments, the isolated nucleic acid 
5 molecules encode a cytoplasmic, transmembrane, or 

extracellular domain of a polypeptide of the invention. 
In another embodiment, the invention provides an isolated 
nucleic acid molecule which is antisense to the coding 
strand of a nucleic acid of the invention. 

10 Another aspect of the invention provides vectors, e.g., 
recombinant esqjression vectors, comprising a nucleic acid 
molecule of the invention. In another embodiment the 
invention provides host cells containing such a vector. 
The invention also provides methods for producing a 

15 polypeptide of the invention by culturing, in a suitable 
medium, a host cell of the invention containing a 
recombinant e3q>ression vector encoding a polypeptide of 
the invention such. that the polypeptide of the invention 
is produced. 

20 Another aspect of this Invention features isolated or 
recombinsmt proteins and polypeptides of .the invention. 
Preferred proteins and polypeptides possess at least one 
biological activity possessed by the corresponding 
naturally* occurring human polypeptide. An activity, a 

25 biological activity, and a fvmctional activity of a 
polypeptide of the invention refers to an activity 
exerted by a protein or polypeptide of the invention on a 
responsive cell as determined in vivo, or in vitro, 
according to standard techniques. Such activities can be 

30 a direct activity, such as an association with or an 
enzymatic activity on a second protein an indirect 
activity, such as a cellular signaling activity mediated 
by interaction of the protein with a second protein. 
Thus, such activities include, e.g., (1) the ability to 

35 form protein-protein interactions with proteins in the 
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signaling pathway of the naturally-occurring 
polypeptide; (2) the ability to bind a ligand of the 
naturally- occurring polypeptide; (3) the ability to bind 
to an intracellular target of the naturally- occurring 
5 polypeptide. Other activities include: (1) the ability 
to modulate cellular proliferation; (2) the ability to 
modulate cellular differentiation; and (3). the ability to 
modulate cell death. 

In one embodiment, a polypeptide of the invention has 

10 an amino acid sequence sufficiently identical to an 

identified domain of a polypeptide of the invention « As 
used herein, the term "sufficiently identical" refers to 
a first amino acid or nucleotide sequence which contains 
a sufficient or minimum number of identical or equivalent 

15 (e.g*, with a similar side chain) amino acid residues or 
nucleotides to a second amino acid or nucleotide sequence 
such that the first and second amino acid or nucleotide 
sequences have a common structural domain and/or common 
fiinctional activity. For example, amino acid or 

20 nucleotide sec[uence8 which contain a common structviral 
domain having about 65% identity, preferably 75% 
identity, more preferably 85%, 95%, or 98% identity are 
defined herein as sufficiently identical. 

In one embodiment, the isolated polypeptide of the 

25 invention lacks both a transmembrane and a cytoplasmic 
domain. In another embodiment, the polypeptide lacks 
both a transmembrane domain and a cytoplasmic domain and 
is soluble iinder physiological conditions. 

The polypeptides of the present invention, or 

30 biologically active portions thereof, can be operably 
linked to a heterologous amino acid sequence to form 
fusion proteins. The invention further f-eatures 
antibodies that specifically bind a polypeptide of the 
invention such as monoclonal or polyclonal antibodies. 

35 In addition, the polypeptides of the invention or 



wo 00/18904 



PCT/U$99y228]7 



- 7 - 

biologically active portions thereof can be incorporated 
into pharmaceutical conqpositions, which optionally 
include pharmaceutical ly acceptable carriers. 

In another aspect, the present invention provides 
5 methods for detecting the presence of the activity or 
e3q>ression of a polypeptide of the invention in a 
biological sanple by contacting the biological sample 
with an agent capable of detecting an indicator of 
activity such that the presence of activity is detected 

10 in the biological sample. 

In another aspect, the invention provides methods for 
modulating activity of a polypeptide of the invention 
comprising contacting a cell with an agent that modulates 
(inhibits or stimulates) the activity or expression of a 

15 polypeptide of the invention such that activity or 

e3q)ression in the cell is modulated. In pne embodiment, 
the agent is an antibody that specifically binds to a 
polypeptide of the invention. 

In another embodiment, the agent modulates expression 

20 of a polypeptide of the invention by modulating 
transcription, splicing, or translation of an mRNA 
encoding a polypeptide of the invention. In yet another 
embodiment, the agent is a nucleic acid molecule having a 
nucleotide sequence that is antisense to 'the coding 

25 strand of an inRNA encoding a polypeptide Of the 
invention. 

The present invention also provides methods to treat a 
subject having a disorder characterized py aberrauit 
activity of a polypeptide of the invention or aberrant 

30 expression of a nucleic acid of the invention by 
administering an agent which is a modulator of the 
activity of a polypeptide of the invention or a modulator 
of the expression of a nucleic acid of the invention to 
the stibject. In one embodiment, the modulator is a 

35 protein of the invention. In another embodiment, the 
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modulator is a nucleic acid of the invention. In other 
embodiments, the modulator is a peptide, peptidomimetic, 
or other small molecule. 

The present invention also provides diagnostic assays 
5 for identifying the presence or absence of a genetic 

lesion or mutation characterized by at least one of: (i) 
aberrcmt modification or mutation of a gene encoding a 
polypeptide of the invention, (ii) mis- regulation of a 
gene encoding a polypeptide of the invention, and (iii) 

10 aberrant post-translational modification of a polypeptide 
of the invention wherein a wild-type form of the gene 
encodes a polypeptide having the activity of the 
polypeptide of the invention. 

In another aspect, the invention provides a method for 

15 identifying a coropoiHid that binds to or modulates the 
activity of a polypeptide of the invention. In general, 
such methods entail measuring a biological activity of 
the polypeptide in the presence and absence of a test 
conpound and identifying those compounds which alter the 

20 activity of the polypeptide. 

The invention also features methods for identifying a 
compound which modulates the expression of a polypeptide 
or nucleic acid of the invention by measuring the 
expression of the polypeptide or nucleic acid in the 

25 presence and cibsence of the compound. 

Other features and advantages of the invention will be 
apparent from the following detailed description and 
claims . 

Brief Description of the Drawings 
30 Figure 1 depicts the cDNA sequence (SEQ ID NO:l) and 
predicted amino acid sequence (SEQ ID NO: 23) of human 
TANGO 180. 
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Figure 2 depicts the cDNA. sequence (SEQ ID NO: 34) and 
predicted amino acid sequence (SEQ ID NO: 54) of murine 
TANGO 180. 

Figure 3 depicts the cDNA sequence (SEQ ID NO: 2) and 
5 predicted amino acid sequence (SEQ ID NO:.24) of human 
TANGO 181, 

Figure 4 depicts the partial cDNA sequence (SEQ ID 
NO: 35; partial) and predicted amino acid* sequence (SEQ ID 
NO: 55; partial) of murine TANGO 181. 
10 Figure 5 depicts the cDNA secpience (SEQ ID NO: 3) and 
predicted amino acid sequence (SEQ ID NO: 25) of human 
TANGO 182. 

Figxire 6 depicts the partial cDNA sequence (SEQ ID 
NO: 36; partial) and predicted amino acid sequence (SEQ ID 
15 NO: 56; partial) of murine TANGO 182. 

Figvure 7 depicts the cDNA sequence (SEQ ID NO: 4) and 
predicted amino acid secpience (SEQ ID N0:26) of human 
TANGO 183. 

Figure 8 depicts the cDNA secpaence (SEQ ID NO: 37) and 
20 predicted amino acid sequence (SEQ ID NO-: 57) of murine 
TANGO 183. 

Figure 9 depicts the cDNA sequence (SEQ ID NO: 5) and 
predicted amino acid sequence (SEQ ID NO: 27) of human 
TANGO 184. 

25 Figure 10 depicts the cDNA sequence (SEQ ID NO: 38) and 
predicted amino acid sequence (SEQ ID NO: 58) of murine 
TANGO 184. 

Figure 11 depicts the cDNA sequence (SEQ ID NO: 6) and 
predicted amino acid sequence (SEQ ID NO: 28) of human 
30 TANGO 185. 

Figure 12 depicts the cDNA sequence (SEQ ID NO: 39) and 
predicted amino acid sequence (SEQ ID NO: 59) of murine 
TANGO 185. 
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Figure 13 depicts the cDNA sequence (SEQ ID NO: 7) and 
predicted amino acid secjuence (SEQ ID NO: 29) of human 
TANGO 186. 

Figiore 14 depicts the cDNA sequence (SEQ ID NO: 40) and 
5 predicted amino acid sequence (SEQ ID NO: 60) of murine 
TANGO 186. 

Figure 15 depicts the cDNA sequence (SEQ ID NO: 8) and 
predicted amino acid sequence (SEQ ID NO: 30) of human 
TANGO 188. 

10 Figure 16 depicts the cDNA sequence (SEQ ID NO; 41) cuid 
predicted amino acid sequence (SEQ ID NQ^61) of murine 
TANGO 188. 

Figure 17 depicts the cDNA sequence (SEQ ID NO: 9) and 
predicted amino acid sequence (SEQ ID NO: 31) of human 
15 TANGO 189. 

Figure 18 depicts the cDNA sequence (SEQ ID NO: 42) and 
predicted amino acid sequence (SEQ ID NO: 62) of murine 
TANGO 189. 

Figure 19 depicts the cDNA sec[uence (SEQ ID NO: 10) and 
20 predicted amino acid sequence (SEQ ID NO:32) of human 
TANGO 215. 

Figure 20 depicts the cDNA sequence (SEQ ID NO: 11) and 
predicted amino sequence of human TANGO 1B7-1/3 (SEQ ID 
N0:22) . 

25 Figure 21 depicts the cDNA sequence (SEQ ID N0:43; 
partial) and predicted amino acid sequence of murine 
TANGO 187 (SEQ ID NO:63; partial). 

Figure 22 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 23) and murine (SEQ ID 
30 NO:54) TANGO 180. 

Figure 23 depicts an aligimient of the predicted amino 
acid sequences of human (SEQ ID NO: 24) and murine (SEQ ID 
NO: 55; partial) TANGO 181. 
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Figure 24 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 25) and murine (SBQ ID 
NO: 5; partial) TANGO 182. 

Figure 25 depicts an alignment of the predicted amino 
5 acid sequences of human (SEQ ID NO: 26) and murine (SEQ ID 
NO: 57) TANGO 183. 

Figure 26 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 27) suid murine (SEQ ID 
NO:58) TANGO 184. 
10 Figure 27 depicts an alignment of the predicted amino 
acid sequences of human (SBQ ID NO: 28) and murine (SEQ ID 
NO: 59) TANGO 185. 

Figure 28 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 29) and murine (SEQ ID 
15 NO: 60) TANGO 186. 

Figure 29 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 30) and murine (SEQ ID 
NO: 61) TANGO 188. 

Figure 30 depicts an alignment of the predicted amino 
20 acid sequences of human (SEQ ID NO: 31) and murine (SEQ ID 
NO: 62) TANGO 189. 

Figure 31 depicts an alignment of the predicted amino 
acid sequences of hijman (SEQ ID NO: 33) and murine (SBQ ID 
NO: 63; partial) TANGO 187. 
25 Figure 32 depicts an alignment of the cDNA sequences of 
human (SEQ ID N0:1) and murine (SEQ ID NO: 34) TANGO 180. 

Figure 33 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 2) and murine (SEQ ID NO: 35; partial) 
TANGO 181. 

30 Figure 34 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 3) and murine (SEQ ID NO: 36; partial) 
TANGO 182. 

Figure 35 depicts cui alignment of the cDNA sequences of 
human (SEQ ID N0:4) and murine (SEQ ID n6:37) TANGO 183. 
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Figure 36 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 5) and murine (SEQ ID NO: 38) TANGO 184. 

Figure 37 depicts an alignment of the cDNA sequences of 
human (SEQ ID N0:6) and murine (SEQ ID NO:39) TANGO 185. 
5 Figure 38 depicts an alignment of the cDNA sequences of 
human (SEQ ID N0:7) and murine (SEQ ID N0:40) TANGO 186. 

Figure 39 depicts an alignment of the cDNA sequences of 
human (SEQ ID N0:8) and murine (SEQ ID N0:41) TANGO 188. 

Figure 40 depicts an alignment of the cDNA sequences of 
10 human (SEQ ID NO: 9) and murine (SEQ ID NO: 42) TANGO 189. 

Figure 41 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 11) and murine (SEQ ID NO: 43; partial) 
TANGO 187. 

Figure 42 depicts an alignment of the amino acid 
15 sequences of human TANGO 181 (SEQ ID NO: 24), murine TANGO 
181 (SEQ ID NO:55; partial), human TANGO 182 (SEQ ID 
N0:25) , and murine TANGO 182 (SEQ ID NO:56; partial) . 

Figure 43 depicts an alignment of the amino acid 
sequences of h\iman TANGO 184 (SEQ ID N0:27) and human 
20 TANGO 183 (SEQ ID N0:26) . 

Figure 44 depicts an alignment of the amino acid 
sequences of murine TANGO 184 (SEQ ID NO: 58) and murine 
TANGO 183 (SEQ ID NO: 57) . 

Figure 45 depicts and alignment of the amino acid 
25 sequences of human TANGO 180 (SEQ ID NO: 23), murine TANGO 
180 (SEQ ID N0:54), agkistrodon PIiA2 (SQ ID NO:109), 
acanthahis PLA2 (SEQ ID NO: 110), and bovine PIiA2 (SEQ ID 
N0:111) . 

Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

30 predicted amino acid sequence (SEQ ID NO: ) of TANC30 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NQ: ) of TANGO 

187-2/3. 
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Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2/3. 

Figure 49 depicts the cDNA sequence (SEQ ID NO: ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2 • 

Figure 50 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

10 Figure 51 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3. 

Figure 52 depicts the cDNA secjuence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

15 187, 

Figxire 53 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence- (SEQ ID NO: ) 

of murine TANGO 181. 

Figure 54 depicts a complete cDNA sequence (SEQ ID 

20 NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 182. 

Figure 55 depicts a conplete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence. (SEQ ID NO: ) 

of murine TANGO 187. 
25 Figure 56 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 215- 

Detailed Description of the Invention 
The present invention is based on the discovery of cDNA 
30 molecules encoding TANGO 180, TANGO 181,. TANGO 182, TANGO 
183, TANGO 184, TANGO 185, TANGO 186, TANGO 188, TANGO 
189, TANGO 215, and TANGO 187, all of which are predicted 
to be either wholly secreted or transmembrane proteins. 
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t;^go mo 

The human TANGO 180 cDNA of SEQ ID N0:1 has a 567 
nucleotide open reading frame (SEQ ID NO: 12) encoding a 
189 amino acid protein (SEQ ID N0:23) . The cDNA and 
5 protein sequences of human TANGO 180 are shown in Figure 
1. 

Human TT^GO 180 is predicted to be a wholly secreted 
protein having a 22 amino acid signal sequence (amino 
acids 1 - 22 of SEQ ID NO: 23; SEQ ID NO: 64) followed by a 

10 167 amino acid mature protein (amino acids 23 - 189 of 
SEQ ID NO: 23; SEQ ID NO: 76 ) . TANGO 180 is predicted to 
have a molecular weight of 21.0 kDa prior to cleavage of 
its signal peptide and a molecular weight of 18.5 kDa 
subsequent to cleavage of its signal peptide. 

15 The murine TANGO 180 of SEQ ID NO: 34 has a 576 

nucleotide open reading frame (SEQ ID NO: 44) encoding a 
192 amino acid protein (SEQ ID NO:54) . The cDNA and 
protein sequences of murine TANGO 180 are shown in Figure 
2. 

20 Figure 22 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 23) and murine (SEQ 
ID NO: 54) TANGO 180 (88.7% identity). Figure 32 depicts 
an alignment of the cDNA sequences of htiman (SEQ ID N0:1) 
and murine (SEQ ID N0:34) TANGO 180 (55%; identity) • 

25 Northern cUialysis of human TANGO 180 niRNA egression 
revealed the presence of two major transcripts (1.3 and 
5.25 kb) and three minor transcripts (0.95, 1.8, and 4.15 
kb) . This analysis also revealed that all five 
transcripts are expressed at a low level in placenta, 

30 lung, and liver; that the 1.3 and the 5.25 kb transcripts 
are expressed at a moderate level in brain and kidney; 
that the 5.25 kb transcript is expressed at a moderate 
level in heart, skeletal muscle, and pcuicreas; and that 
the 1.3 kb transcript is expressed at a high level in 

35 heart, skeletal muscle, and pancreas. 
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In situ expression cinalysis of TANGO 180 in adult 
murine tissue revealed no significant esqpression in 
bladder, pancreas, heart, thymus, kidney, brain, colon, 
placenta, eye, liver, spleen, lung, skeletal 
5 muscle/diaphram, or small intestine. In situ expression 
analysis of murine embryonic tissue revealed expression 
in the liver at E13.5 through E16-5- Liver expression 
was also observed, although at a lower level, at B17.5 
and PI. 5* 

10 TANGO 180 maps to human chromosome location 4q25. 
TANGO 180 is predicted to have a phospholipase A2 
histidine active site domain at amino acids 106-113 of 
SBQ ID NO: 23 and a phospholipase A2 aspartic acid active 
site-like domain at amino acids 124-131 of SEQ ID NO:23. 

15 An apparent genomic sequence of TANGO 180 appears at 
GenBaiik Accession Number AC004067. 

Human TANGO 160 bears some similarity to a number of C. 
Blegana proteins. 

TJ\NGO 180 bears some similarity to a number of known 

20 phospholipase A2 (PIiA2) proteins (Lambeau et al. (1994) 
Biol. Ch&ai, 269:1575-78; Lambeau et al. (1995) J. 
Biol. Chem. 270:5534-40). TANGO 180 may play a role 
similar to that of a phospholipase A2* Figure 45 
depicts and alignment of the amino acid sequences of 

25 human TANGO 180 (SEQ ID NO:23) , murine TANGO 180 (SEQ ID 
NO:54), agkistrodon PIiA2 (SQ ID NO: 109), acanthahis PLT^ 
(SEQ ID N0:110), and bovine PLA2 (SEQ ID N0:111) . There 
are thought to be at least two important regions within 
many PLA2's: CCXXHCCX (hisitidine at active site) and 

30 LIVMACLIVMFYWPCSTCDXXXXXC (aspratic acid active site) . 
Various phospholipase A2 proteins are thought to be 
involved in inflammation. Moreover, it appears that the 
expression and synthesis of at least some phospholipase 
A2 proteins are induced by pro-inflammatory modulators 

35 such as interleukin-1, interleukin-6, and tumor necrosis 
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factor. Thus, TANGO 180 may be involved in inflammation, 
e.g., arthritis, endotoxic shock, peritonitis, psoriasis, 
acute pancreatitis, and respiratory distress syndrome. 
Accordingly, TANGO 180 nucleic acid molecules and 
5 polypeptides as well as anti-TANGO 180 antibodies and 
modulators of TANGO 180 expression or activity may be 
useful in the treatment of such disorders. Moreover, 
PLA2's have been implicates in digestion, airway 
contraction, smooth muslce contraction, fertilization, 

10 and cell proliferation. Thus, TANGO 180 nucleic acid 
molecules and polypeptides as well as anti-TANGO 180 
antibodies and modulators of TANGO 180 expression or 
activity may be useful in the treatment of disorders of 
digestion, airway contraction, smooth muslce contraction, 

15 fertilization, and cell proliferation. 

TANGO 181 

The human TANGO 181 cDNA of SEQ ID NO: 2 has a 1017 
nucleotide open reading frame (SEQ ID NO:, 12) encoding a 
339 amino acid protein (SEQ ID NO: 23) . The cDNA and 
20 protein sequences of human TANGO 181 are shown in Figure 
3. 

Human TANGO 181 is predicted to be a secreted protein 
having a 22 amino acid signal sequence (amino acids 1 * 
22 of SEQ ID NO: 24; SEQ ID NO: 65) followed by a 317 amino 

25 acid mature protein (amino acids 23-339 of SEQ ID 

NO: 24; SEQ ID N0:77) . TANGO 181 is predicted to have a 
molecular weight of 37.8 kDa prior to cleavage of its 
signal peptide and a moleculcu: weight of .35.2 subsequent 
to cleavage of its signal peptide. 

30 The murine TANGO 181 partial cDNA of SEQ ID NO: 35 has a 
747 nucleotide open reading frame (SEQ ID NO: 45) encoding 
a 249 amino acid protein (SEQ ID NO: 55) • The partial 
cDNA and protein sequences of murine TANGO 181 are shown 
in Figure 4. 
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Figure 23 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 24) and murine (SEQ 
ID NO:55; partial) TANGO 181 (72.1% identity). Figure 33 
depicts an alignment of the cDNA sequences of human (SEQ 
5 ID NO:2) and murine (SEQ ID NO:35; partial) TANGO 181 
(65.4% identity). The pair of cysteines at amino acids 
76 and 129 might be important for disulfide bond 
formation. The single cysteine at amino acid 262 might 
enable TANGO 181 to form homodimers (or heterodimers with 

10 TANGO 182) . 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TAN(30 181 clone are shown in Figure 53. 

Northern analysis of human TANGO 181 mRNA expression 

15 revealed the presence of two transcripts (4.3 and 4.5 kb) 
esqpressed at a low level in heart, brain, placenta, lung, 
liver, skeletal muscle, kidney, and pancreas, with the 
level of expression in the pancreas being higher than in 
the other tissues. 

20 Murine in situ esqpression analysis revealed that TANGO 
181 is weakly expressed in adult brain (choroid plexus 
and olfactory bulb) . This analysis also revealed TANGO 
180 expression in the liver and kidney (medulla) . High 
level TANGO 180 expression was observed in testis. This 

25 analysis detected little or no expression of TANGO 181 in 
adult liver, ovary, heart, lung, spleen, fat, muscle, 
skin, stomach, duodenum, colon, pancreas, thymus, 
pituitary, and eye. In situ expression analysis of 
embryos revealed that TANGO 181 is ubiquitously expressed 

30 at stages E12.5, B13*5, and E14.5. 

TANGO 181 maps to human chromosome location 8pl2. WI- 
5768 and AFMB057WG5 are markers which flank TANGO 181. 
Nearby loci include WRN (Wemer Syndrome) and SPG5A 
(Spastic Paraplegia 5A) , and nearby known genes include 

35 FGFRl (fibroblast growth factor receptor) , STAR 
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(Steroidogenic acute regulatory protein) , MXKl (abkyrin 
1) , CALBl (calbindin 1) , CHRNB3 (cholinergic receptor, 
nicotinic) . The human chromosotnal location corresponds 
to a position on mouse chromosome 8 near fgfri 
5 (fibroblast growth factor receptor) , cym (cyritesin 1) , 
tissue plasminogen activator ^ and ank (ankyrin 1) . 

Within the 3' untranslated region of the human TANGO 
181 cDNA described above is a 260 base pair sequence 
(Genbank Accession Niunber Z36802) previously identified 

10 as part of a gene that appears to be preferentially 

expressed in pancreatic cancer and chronic pancreatitis 
(Gress et al. (1996) Oncogene 13:1819-30). Thus, TANGO 
181 nucleic acids and polypeptides may be useful for the 
diagnosis and/or treatment of chronic pancreatitis and 

15 pancreatic cancer (as well as other cancers) . In 

addition, modulators of TANGO 181 expression or activity 
may be useful in the treatment of such disorders. 

TANGO 181 and TANGO 182 are highly homologous to teh C. 
elegans protein C42C1,9 

20 TANGO 182 

The htunnan TANGO 182 cDNA of SEQ ID NO: 3 has a 1044 
nucleotide open reading frame (SBQ ID NO: 14) encoding a 
348 amino acid protein (SEQ ID NO:25) • The cDNA and 
protein sequences of human TANGO 182 are shown in Figure 
25 5. 

Human TANGO 182 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 ~ 
23 of SEQ ID NO: 25; SEQ ID NO: 66) followed by a 325 amino 
acid mature protein (amino acids 24 - 348 of SEQ ID 
30 NO: 25; SEQ ID NO:78) . TANGO 182 is predicted to have a 
molecular weight of 39.2 kDa prior to cleavage of its 
signal peptide and a molecular weight of 36.1 kDa 
subsequent to cleavage of its signal peptide. 
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The murine TANGO 182 partial cDNA of SEQ ID NO: 36 has 
an 825 nucleotide open reading frame (SEQ ID NO:46) 
encoding a 275 amino acid protein (SEQ ID NO:56) . The 
partial cDNA and protein sequences of murine TANGO 182 
5 are shown in Figure 6* Figure 24 depicts an alignment 

of the predicted amino acids sequences of human (SEQ ID 
NO: 25) and murine (SEQ ID NO: 56; partial) TANGO 182 
(75.1% identity)- Figure 34 depicts an alignment of the 
cDNA sequences of hiaman (SEQ ID NO: 3) and murine (SEQ ID 

10 NO:36; partial) TANGO 182 (67.6% identity). The pair of 
cysteines at amino acids 78 and 130 might be important 
for disulfide bond formation. The single cysteine at 
amino acid 312 might enable TANGO 182 to form homodimers 
(or heterodimers with TANGO 181) . 

15 The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 182 clone are shown in Figure 54. 

TANGO 182 maps to human chromosomal location 10q24 
between markers D10S566 and D10S540. In mice , TANGO 182 
20 maps to chromosome 10 bwtween D10S198 and D10S192 (129.8 
to 131.2 cM) . 

Northern analysis of human TANGO 182 mlRNA expression 
revealed the presence of a 2.8 kb transcript that is 
expressed at a high level placenta and a somewhat lower 

25 level in liver, kidney, and pancreas. This transcript is 
expressed at a low level in heart, brain, lung, and 
skeletal muscle. 

Murine in situ expression analysis revealed that TANGO 
182 is esqpressed at a high level in testis in adult mice. 

30 Little or no e3q)ression was detected in adult brain, 
liver, kidney, ovary, heart, l\ing, spleen, fat, muscle, 
skin, stomach, duodenum, colon, pancreas, thymus, 
pituitazy, or eye by in situ analysis. In situ 
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expression analysis of embryos revealed ubiquitous, low 
level expression at stages B12.5^ E13.5, and E14.5. 

Both human and mouse TANGO 182 are quite similar to 
human and murine TANGO 181 at the amino acid level 
5 (Figure 42). Thus, TANGO 182, like TANGO 181, may be 
useful for the diagnosis and/or treatment of pancreatic 
cancer and chronic pancreatitis as well as other cancers. 
In addition, TANGO 182 bears some similarity to a C. 
elegans protein C42C1.9 (Genbank Accession Ntunber 

10 AF043695) that is encoded by a gene that is present in 
the same operon as a gene encoding a mitochondrial 
carrier protein. Since genes within the same operon are 
often co-regulated and encode proteins involved in the 
same physiological state, TANGO 182 may play a role in 

IS metabolism. Thus, TANGO 182 nucleic acids and 

polypeptides as well as antibodies directed against TANGO 
182 may be useful in the diagnosis and treatment of 
metabolic disorders. In addition, modulators of TANGO 

182 expression or activity may be useful in the treatment 
20 of such disorders - 

TAwgQ 183 

The human TANGO 183 cDNA of SHQ ID N0:4 has a 549 
nucleotide open reading frame (SEQ ID NO: 15) encoding a 

183 amino acid protein (SEQ ID NO:26) . The cDNA and 

25 protein secpiences of human TANGO 183 are shown in Figure 
?• 

Human TANGO 183 is predicted to be a transmembrane 
protein having a 20 amino acid signal secjuence (amino 
acids 1 - 20 of SEQ ID NO:26; SEQ ID NO: 67) followed by a 
30 163 amino acid mature protein (amino acids 21 - 183 of 
SEQ ID NO: 26; SEQ ID NO: 79 ) having a 69 amino acid 
extracellular domain (amino acids 21 - 89 of SEQ ID 
NO:26; SEQ ID N0:88) , a 23 amino acid trcUismembrane 
domain (amino acids 90 - 112 of SEQ ID NO: 26; SEQ ID 



wo 00/18904 



PCTAJS99/22817 



- 21 - 

NO: 94), and a 71 amino acid cytoplasmic domain (amino 
acids 113 - 183 of SEQ ID NO 26; SEQ ID NO: 102) . There 
are 8 conserved cysteines in the extracellular domain. 
TANGO 183 has a high porportion of charged amino acids in 
5 the predicted extracellular (18%, not including 

histidines) and cytoplasmic (32%) domains. Human TANGO 
183 is predicted to have a molecular weight of 20.6 kDa 
prior to cleavage of its signal peptide and a molecular 
weight of 18.1 kDa subsequent to cleavage of its signal 
10 peptide. 

The murine TANGO 183 cDNA of SEQ ID NO: 37 has a 549 
nucleotide open reading frame (SEQ ID NO: 47) encoding a 
183 amino acid protein (SEQ ID NO:57) . The cDNA and 
protein sequences of murine TANGO 183 are shown in Figure 
15 8. 

Figure 25 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 26) and murine (SEQ 
ID NO:57) TANGO 183 (97.3% identity). Figure 35 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 4) 
20 and murine (SEQ ID NO:37) TANGO 183 (71,7% identity). 
The conserved cysteine residues are particularly 
important and are preferably retained in fxmctional 
variants . 

Northern analysis of human TANGO 183 mRNA e3q>ression 
25 revealed the presence of a 1.6 kb transcript that is 

expressed at a high level in brain, kidney, pancreas, and 
heart; at a moderate level in liver and skeletal muscle, 
and at a low level in placenta and lung. 

The nucleic acid sequence of TANGO 183 is related to a 
30 sequence tagged site at chromosomal location llpl5.4, and 
TANGO may map to this site. 

The predicted cytoplasmic domain of TANGO 183 has a 
relatively high number of charged residues (32%) . This 
suggests that TANGO 183 may non-covalently, e.g., 
35 electrostatically, associate with an intracellular 
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molecule such as a cytoskeletal coinponent. Accordingly, 
TANGO 183 may itself be involved in maintaining the 
structural integrity of cells in which it is expressed. 
If so, aberrant TANGO 183 protein or aberrantly regulated 
5 TANGO 183 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accordingly, TANGO 183 nucleic acid molecules and 
polypeptides as well as ant i -TANGO 183 antibodies and 
modulators of TANGO 183 e3q>ression or activity may be 

10 useful in the treatment of disorders associated with 

aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 183 and TANGO 184 are related and may play 
similar functional roles. Figure 43 depicts an alignment 

15 of the amino acid sequences of human TANGO 184 (SEQ ID 
N0:27) and human TANGO 183 (SEQ ID NO:26) . Figure 44 
depicts an alignment of the amino acid sequences of 
murine TANGO 184 (SEQ ID NO: 58) and murine TANGO 183 (SEQ 
ID NO: 57) . 

20 TANGO 183 is related to C. elegang R12C12.6 (GenBank 
Accession NO. U23510) . 

TANgp IB^ 

The human TANGO 184 cDNA of SEQ ID NO: 5 has a 594 
nucleotide open reading frame (SEQ ID NO: 16) encoding a 
25 198 amino acid protein (SEQ ID NO: 27) . The cDNA and 

protein sequences of human TANGO 184 are shown in Figure 
9. 

Human TANGO 184 is predicted to be a transmembrane 
protein having a 28 amino acid signal sequence (amino 
30 acids 1 - 28 of SEQ ID NO: 27; SEQ ID NO: 68) followed by a 
170 amino acid mature protein (amino acids 29 - 198 of 
SEQ ID NO:27; SEQ ID NO:80) having a 74 amino acid 
extracellular domain (amino acids 29 - 102 of SEQ ID NO: 
27; SEQ ID NO:89) , a 23 amino acid transmembrane domain 
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(amino acids 103 - 125 of SEQ ID NO:27; SEQ ID NO:95), 
and a 73 aniino acid cytoplasmic domain (amino acids 126 - 

198 of SEQ ID NO 27; SEQ ID NO: 103) . TANGO 184 has a 
high porportion of charged amino acids in the predicted 

5 extracellular (31%) and cytoplasmic (29%) domains. 
Notably, the transmembrane regions include charged 
residues. Human TANGO 184 is predicted to have a 
molecular weight of 22.5 kDa prior to cleavage of its 
signal peptide and a molecular weight of 18.9 kDa 
10 subsequent to cleavage of its signal peptide. 

The murine TANGO 184 cDNA of SEQ ID NO: 38 has a 357 
nucleotide open reading frame (SEQ ID NO:48) encoding a 

199 amino acid protein (SEQ ID NO:58) . The cDNA and 
protein sequences of murine TANGO 184 are shown in Figure 

15 10. 

Figure 26 depicts an alignment of the predicted amino 
acids sequences of humeua (SEQ ID NO: 27) and murine (SEQ 
ID NO:58) TANGO 184 (94.5% identity). Figure 36 depicts 
an alignment of the cDNA sequences of hiunan (SEQ ID NO: 5) 

20 and murine (SEQ ID NO:38) TANGO 184 (63.8% identity) . 

Northern analysis of human TANGO 184 mRNA expression 
revealed the presence of a 2 kb transcript that is 
expressed at a high level in heart brain, placenta, 
skeletal muscle, kidney, and pancreas; and at a low level 

25 in lung and liver. There are two alternative polyA 
sites: nucleotide 1000 and nucleotide 2000. 

In situ analysis of TANGO 184 expression in adult mice 
revel expression in the brain (moderate, ubiquitous 
expression) , spinal cord (weak expression in the region 

30 of the grey matter) submandibular gland (strong, 

ubiquitous expression) , stomach (weak expression in the 
muscle region) , Kidney (weak, \abiquitous expression in 
the cortex and medulla, stronger e3q>ression in papilla) , 
adrenal gland (weak ubiquitous expression) , thymus (weak 

35 expression in cortex) , lymph node (moderate ubiquitous 
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expression) spleen (weak expression in follicles) , 
skeletal muscle /smooth muscle (diaphragm) , testis (strong 
expression in the area surrounding the seminiferous 
tubules) , ovaries (weak expression) placenta (moderate, 
5 ubiquitous expression) . This analysis did not reveal 
significant expression in white fat, brown fat, heart, 
lung, liver, pancreas, colon, small intestine, and 
bladder. In embryonic tissue, this analysis revealed 
expression at B13.5 (weak to moderate ubiquitous 

10 e3q>ression with higher expression in the brain and 
liver) , B14 • 5 (weak to moderate ubiquitous expression 
with higher expression in the brain and liver) , B15.5 
(moderate ubiquitous es^ression with higer esqpression in 
the brain) , E16 . 5 (weak to moderate ubiquitous expression 

15 with higher expression in the brain, spinal cord, brown 
fat, submandibular glcmd, lung, stomach, and intestines) , 
E18.5 (weak to moderate ubiquitous expression with higher 
expression in the brain, spinal cord, brown fat, 
submandibular gland, Ixrng, stomach, and intestines) , cmd 

20 PI. 5 (weak ubiquitous expression with higer expression in 
brain, submandibular gland, olfactory epithelium, and 
stomach) . 

The predicted cytoplasmic domain of TANGO 184 has a 
relatively high number of charged residues (29%) . This 

25 suggests that TANGO 184 may non-covalently, e.g., 
electrostatically, associate with an intracellular 
molecule such as a cytoskeletal component. Accordingly, 
TANGO 184 may itself be involved in maintaining the 
structtiral integrity of cells in which it is expressed. 

30 If so, aberrant TANGO 184 protein or aberrantly regulated 
TANGO 184 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accoxtlingly, TANGO 184 nucleic acid molecules and 
polypeptides as well as anti -TANGO 184 antibodies and 

35 modulators of TANGO 184 expression or activity may be 
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useful in the treatment of disorders associated with 
aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 185 

5 The human TANGO 185 cDNA of SEQ ID NO: 6 has a 579 
nucleotide open reading frame (SEQ ID NO: 17) encoding a 
193 amino acid protein (SEQ ID N0:28) . The cDNA and 
protein sequences of human TANGO 185 are shown in Figure 
11. 

10 Human TANGO 185 is predicted to be a transmembrane 
protein having a 24 amino acid signal sequence (amino 
acids 1 - 24 of SEQ ID NO:28; SEQ ID N0:69) followed by a 
169 amino acid mature protein (amino acids 25 - 193 of 
SEQ ID NO: 28; SEQ ID NO: 81) having two extracellular 

15 domains, one having 51 amino acids (amino acids 25 - 75 
of SEQ ID NO:28; SEQ ID NO:90) , and a second having 19 
amino acids (amino acids 132 - 150 of SEQ ID NO: 28; SEQ 
ID NO: 91); three transmembrane domains, one having 27 
amino acids (amino acids 76 - 102 of SEQ ID NO: 28; SEQ ID 

20 NO: 96), a second having 22 amino acids (amino acids 110- 
131 of SEQ ID NO:28; SEQ ID NO:97) , the third having 24 
amino acids (amino acids 151 - 174 of SEQ ID NO: 28; SEQ 
ID NO: 98); and two cytoplasmic domains, one having 7 
amino acids (amino acids 103 - 109 of SEQ ID NO: 28; SEQ 

25 ID NO: 104), and a second having 19 amino acids (amino 
acids 175 - 193 of SEQ ID NO:28; SEQ ID N0:105) . The 
predicted 22 amino acid trauismembrane domain and the 
predicted 24 amino acid domain, along with the predicted 
7 amino acid cytoplasmic domain may form one hydrophobic 

30 domain that passes through the membrane twice. TANGO 185 
is predicted to have a molecular weight of 21.4 kDa prior 
to cleavage of its signal peptide cuid a molecular weight 
of 18.8 kDa subsequent to cleavage of its signal peptide. 
Notably, the tremsmembreuie regions have charged residues. 
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The murine TANGO 185 cDNA of SEQ ID NO: 39 has a 579 
nucleotide open reading frame (SEQ ID NO: 49} encoding a 
193 amino acid protein (SEQ ID NO:59) . The cDNA and 
protein sequences of murine TANGO 185 are shown in Figure 
5 12. 

Figure 27 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO:28) and murine (SEQ 
ID NO:59) TANGO 185 (90.7% identity). Figure 37 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 6) 

10 and murine (SEQ ID NO:39) TANGO 185 (71.1% identity). 
Human TANGO 185 maps to chromosome 6. 
Northern analysis of human TANGO 185 mRNA expression 
revealed the presence of 2.2 kb major transcript and a 
4.2 kb minor transcript. This analysis also revealed 

IS that the 2.3 kb transcript is expressed at a high level 
in heart, placenta, suid pcuicreas; at a moderate level in 
lung, liver, and kidney; and at a very low level, if at 
all, in brain and skeletal muscle. The 4.2 kb transcript 
is expressed at a low level in placenta. 

20 In situ analysis of TANGO 185 expression in adult mice 
revealed e^qpression in the brain (choroid plexus) , 
submamandibular gland (ubiquitous expression) , white fat 
(weak expression, possible mammary gland expression) , 
stomach (mucosal epithelium) , kidney (medulla-cortex 

25 transition and medullary rays) , colon (weak expression in 
the epithelium), small intestine (villi), thymus (low 
level expression) , bladder (mucosal epithelium) , and 
placenta (ubiquitous expresion in decidua region) . This 
analysis did not reveal significant expression in adult 

30 eye and harderian gland, brown fat, heart, lung, liver, 
spleen, pancreas, skeletal muscle, testes, and ovaries. 

Xn situ analysis of TANGO 185 embryonic expression in 
mice revealed e^qpression at B13.5 (high level expression 
the skin and siibmaxillary gland and low level ubiquitous 

35 esqpression in the liver); E14.5 (high level e^qpression in 
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the choroid plexus of the lateral and fourth ventricles, 
skin, epithelium of the oral cavity, follicles of 
vibrissa, submaxillary glcmd, stomach, and heart; 
e3q)ression in lung (especially the developing large 
5 airways) and liver (ubiquitous expression)). At E15.5 the 
observed e3q)ression pattern is nearly identical to that 
at E14,5 except that there is expression in the region 
outlining the intestinal tract and lung es^ression is 
ubiguitous with higher expression in the region outlining 

10 the large airways. 

At E16.5 high level e3q>re8sion is observed in skin 
choroid plexus, the lining of the oral and nasal cavity, 
esophagus, bladder, stomach, intestine, large vessels of 
the heart, large airways of the liing, and the region 

15 outlining the vertebrae. Lower ubiguitous expression is 
present in the heart, Ixing and thymus. A somewhat 
higher, multifocal expression is present in the thymus. 

At B18.5 the expression pattern is identical to that 
observed at E16.5 except that eoqpression is also observed 

20 in developing hair follicles. 

At PI. 5 the esqpression pattern is identical to that 
observed at E16.5 except that there is no long 
significant expression in the region outlining the 
vertebrae . 

25 The es^ression pattern of TANGO 185 during euQ^ryonic 
development suggests that TANCtO 185 expression is 
strongly associated with squamous and mucosal epithelial 
cells. 

The expression pattern of TANGO 185 suggests that it is 
30 involved in cell development and/ or cell differentiation. 
Accordingly, TANGO 185 nucleic acid molecules and 
polypeptides as well as anti-TANGO 185 antibodies and 
modulators of TANGO 185 e3q>ression or activity may be 
useful in the treatment of disorders associated with 
35 cJt>errant cell development or cell differentiation, e.g.. 
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cancer. There is evidence that TANGO 185 is expressed in 
prostate cells. Thus, TANGO 185 nucleic acid molecules 
and polypeptides as well as anti-TANGO 185 antibodies and 
modulators of TANGO 185 expression or activity may be 
5 useful in the treatment of prostate cancer. 

TANGO 186 

The human TANGO 186 cDNA of SEQ ID NO: 7 has a 1149 
nucleotide open reading frame (SEQ ID NO: 18} encoding a 
383 amino acid protein (SEQ ID NO:29) . The cDNA and 
10 protein sequences of human TANGO 186 are shown in Figiire 
13. 

Human TAN(30 186 is predicted to be a secreted protein 
having a 20 amino acid signal sequence (amino acids 1 - 
20 of SEQ ID NO: 29; SEQ ID NO: 70) followed by a 363 amino 

15 acid mature protein (amino acids 21 - 383 of SEQ ID 
NO: 29; SEQ ID NO: 82) . There are eight cysteines in 
mature TANGO 186. Some or all of these might be involved 
in disulfide bond formation. Human TANGO 186 is 
predicted to have a molecular weight of 43.0 kDa prior to 

20 cleavage of its signal peptide and a molecular weight of 
40.3 kDa subsequent to cleavage of its signal peptide. 

The murine TANGO 186 cDNA of SBQ ID NO: 40 has a 1146 
nucleotide open reading frame (SEQ ID NO: 50) encoding a 
382 amino acid protein (SEQ ID NO: 60) . The cDNA and 

25 protein sequences of murine TANGO 186 are shown in Figure 
14 . Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Figure 28 depicts axi alignment of the predicted amino 
30 acids sequences of human (SEQ ID NO: 29) and murine (SEQ 
ID NO:60) TANGO 186 (90.9% identity). Figure 38 depicts 
an alignment of the cDNA secpiences of human (SEQ ID NO: 7) 
and murine (SEQ ID NO:40) TANGO 186 (41.6% identity). 
The human and murine TANGO 186 proteins are highly 
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similar except within three portions: the signal 
sequence, a hinge region at amino acids 108-123, and a 
hinge region at amino acids 198-216* Within these three 
portions the proteins are only about 50% identical. 
5 Outside of these three portions the proteins are about 
97.3% identical. 

TANGO 186 maps to human chromosome llql4. 
Northern analysis of human TANGO 186 mRNA expression 
revealed the presence of a 1.8 kb transcript and a 4 kb 

10 transcript. Both transcripts are expressed at a low 

level in heart, lung, liver, skeletal muscle, kidney, and 
pancreas and at a very low level in brain. 

In situ analysis of TANGO 186 in adult mice revealed 
that TT^GO 186 is expressed in brain (olfactory bulb) , 

15 spleen (low level ubiquitous signal) , small intestine 
(very strong signal in villi and submucosa) , colon 
(ubiquitous signal) , kidney (cortical and medullary 
region) , lung (bronchial epithelium) , eye (iris and 
cornea) , placenta (strong signal in the outer membrane) . 

20 This analysis did not detect esqpression in adult 

pancreas, heart, skeletal muscle, diaphragm, esophagus, 
liver, and thymus. 

In situ e3qc>ression analysis of murine embryonic 
sagittal sections revealed e3q>ression at stage E13.5 in 

25 epithelium of the lower and upper lip, cartilage 

primordium of basisphenoid bone, cartilage condensation 
of sacral vertebral body (centrum) , small intestine, and 
heart. At stage E14.5, in addition to the expression 
observed at stage £13.5, expression was also observed in: 

30 eye (or cartilage around eye), Meckel's cartilage, and 
cartilage of the limb digits. At stage E15.5 e3q>ression 
was observed in vibrissae of the snout, kidney (embryonic 
glomeruli), cartilage of the limb digits, cartilage of 
the vertebral column, heart, eye, and small intestine. 

35 At stage B16.5 the observed e^qpression pattern was 
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similar to that observed at B15.5, but there was a 
notable reduction in signal from cartilage, epithelium of 
upper and lower lip, and heart. Also at stage E16.5 low 
level signal was observed in the lung, and a strong 
5 signal was still observed in the small intestine. At 
stage E17,5 expression of TANGO 186 was observed to be 
more ubiquitous. However, expression in cartilage was 
observed to decrease with the exception of ossification 
within cartilage primordium of body of mandible. At 

10 stage E17.5 strong expression continued to be observed in 
the small intestine. The expression pattern at stage 
PI. 5 was observed to be very similar to that observed at 
stage E17.5 with expression being nearly ubiquitous with 
the notable exceptions of the brain and spinal cord in 

15 which little or no e3q>ression was observed. At stage 
PI. 5 the highest expression observed was in the in the 
small intestine, lung, and kidney. 

Overall, the in situ expression analysis of adult and 
embryonic tissue revealed that expression is first 

20 observed in the developing cartilage, small intestine, 
and heart with the cartilage expression being most 
striking in the developing vertebral column and jaw area. 
Strong expression in the cartilage of the vertebral 
column and developing digits was observed through stage 

25 E16.5. Subsequently, cartilage egression was observed 
to decrease with some exceptions in the jaw area. Other 
embryonic tissue in which the observed expression was 
notable include the kidney, specifically the embryonic 
glomeruli, and the lung. These tissues continue to have 

30 strong expression in the adult with ejqiression in the 
kidney also being observed in the medullary region and 
lung expression becoming restricted to the bronchial 
epithelium. Expression of TANGO 186 becomes more 
ubiquitous through PI. 5 with the most noticeable 
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exception being the brain and spinal cord. In the adult, 
however, signal is observed in the olfactory bulb. 

In a murine LPS disease model, increasaed TANGO 186 
expression was observed in the brain 2 and 8 hours after 
5 LPS treatment. Decrease TANGO 186 expression was 

observed at these same time points in the kidney. TANGO 
186 expression was also observed in the gastric mucosa. 

As discussed above, murine in situ expression analysis 
demonstrates that TANGO 186 is e3q>re68ed in cartilage 

10 throughout the embryo, suggesting that TANGO 186 is a 

regulatoiry molecule that plays a role in a bone formation 
(e.g., condensation of cartilage). Accordingly, TANGO 
186 nucleic acid molecules and polypeptides as well as 
anti-TANGO 186 antibodies and modulators of TANGO 186 

15 esqpression or activity may be useful in the diagnosis cuid 
treatment of bone and cartilage disorders (e.g., 
osteogenesis imperfecta and broken bones, cartilage 
degradation, and bone degradation) . Moreover, many bone 
morphogenic proteins and TGF-/8 family members are 

20 regulated by extracellular proteins, e.g., noggin and 
chordin. Thus, TANGO 186, which is expressed in the 
heart, may play a role in heairt development, and TANGO 
186 nucleic acid molecules and polypeptides as well as 
anti-TANGO 186 antibodies and modulators of TANGO 186 

25 es^ression or activity niay be useful in the diagnosis euid 
treatment of developmental disorders of the heart, e.g., 
valve malformation. 

There is some seqeunce similarity between TANGO 186 and 
a Bacillus serine protease. Thus, TANGO 186 may have 

30 serine protease activity. 

TANGO 188 

The human TANGO 188 cDNA of SEQ ID NO: 8 has a 792 
nucleotide open reading frame (SEQ ID NO: 19) encoding a 
264 amino acid protein (SEQ ID NO: 30) . The cDNA and 
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protein sequences of human TANGO 188 are shown in Figure 
15. 

Human TANGO 188 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 - 
5 23 of SEQ ID NO: 30; SEQ ID NO: 71) followed by a 241 amino 
acid mature protein (amino acids 24 - 264 of SEQ ID 
NO:30; SEQ ID NO:83). Human TANGO 188 is predicted to 
have a molecular weight of 29,5 kDa, prior to cleavage of 
its signal peptide* 
10 The murine TANGO 188 cDNA of SEQ ID N0:41 has an. 807 
nucleotide open reading frame (SEQ ID NO: 51) encoding a 
269 amino acid protein (SEQ ID NG:61) . The cDNA and 
protein sequences of murine TANGO 188 are shown in Figure 
16. 

15 Figxire 29 depicts an alignment of the predicted amino 
acids sequences of himian (SEQ ID NO: 30) and murine (SEQ 
ID N0:61) TANGO 188 (80*5% identity). Figure 39 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 8) 
and murine (SEQ ID N0:41) TANGO 188 (71.8% identity), 

20 TANGO 188 maps to human chromosome 16pl3.3. 

Northern analysis of htimeui TANGO 188 mRNA expression 
revealed the presence of 2.0 kB transcript that is 
expressed at a low level in heart and pancreas £uid at a 
very low level, if at all, in brain, placenta, lung, 

25 liver, skeletal muscle, and kidney. 

In situ analysis of TANGO 188 expression in adult mice 
did not detect significant expression in in the bladder, 
placenta, pancreas, eye, heart, liver, thymus, spleen, 
kidney, lung, brain, skeletal muscle/diaphragm, colon, or 

30 small intestine. In situ analysis of TANGO 188 

expression in embryos revealed no significant expression 
at 13.5, E14.5, B15.5, E16.5, E17.5, or PI. 5. However, 
in the case of both adult mice and embryos, expression of 
TANGO 188 may have been obscured by a high backgroiind 

35 signal. 
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TANGO 188 is transcribed in an anti-sense relationship 
to NY-CO-7 (Scanlon et al • (1998) Jnt. J". Cancer 76:652- 
58) . Accordingly, TANGO 188 may have utility as a marker 
for colon cancer, and TANGO 188 nucleic acid molecules 
5 and polypeptides as well as ant i- TANGO 188 antibodies and 
modulators of TANGO 188 expression or activity may be 
useful in the diagnosis and treatment of colon cancer or 
other types of cancer. 

The gene encoding the C. elegans homologue of NY-CO-7 

10 is present in the same operon as a gene encoding a 
mitochondrial import protein. Since genes within the 
same operon are often co-regulated and encode proteins 
involved in the same physiological state, TANGO 188 may 
be a mitochondrial import protein or may be involved in 

15 some other mitochondrial function. Thus, TANGO 188 
nucleic acids and polypeptides as well as antibodies 
directed against TANGO 188 and modulators of TANGO 188 
expression or activity may be useful in the diagnosis and 
treatment of disorders associated with defects in 

20 mitochondrial function. 

TANGO 188 appears to be the homologue of a C. elegans 
protein that is present in the same operon as a gene 
encoding a protein that bears some similarity to SnP8p, a 
yeast zinc finger protein that is likely a transcription 

25 factor involved in expression of genes encoding certain 
proteins involved in respiration and metabolism. Since 
genes within the same operon are often co~ regulated and 
encode proteins involved in the same physiological state, 
TANGO 188 may play a role in respiration or metabolism. 

30 Thus, TANGO 188 nucleic acids and polypeptides as well as 
antibodies directed against TANGO 188 and modulators of 
TANGO 188 expression or activity may be useful in the 
diagnosis and treatment of disorders associated with 
defects in cell respiration or metabolism. 
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TANGO 189 

The human TANGO 189 cDNA of SEQ ID NO: 9 has a 759 
nucleotide open reading frame (SEQ ID NO: 20) encoding a 
253 amino acid protein (SEQ ID NO: 31) . The cDNA and 
5 protein sequences of human TANGO 189 are shown in Figure 
17. 

The human TANGO 189 cDNA described above (SEQ ID N0:9; 
Figure 17) represents one splice variant of TANGO 189 
(splice variant lA) . There exists .a second splice 

10 variant of human TANGO 189 (splice variant IB) . The cDNA 
sequence of this splice variant is the same the cDNA 
sequence of human TANGO 189 described above, except that 
nucleotides 674-1087 are missing. This splice variant 
cDNA encodes a 184 amino acid protein having a predicted 

15 molecular weight of 21.1 kDa prior to cleavage of the 
predicted signal secpaence. Both splice variant lA and 
splice variant IB appear to airise from a 2,1 kB 
transcript which is 2055 nucleotides long, not including 
the polyA sequence. This transcript encodes a 253 amino 

20 acid protein having a predicted molecular weight of 28.6 
kDa, not including the predicted signal sequence. 

The 2.1 kb TANGO 189 transcript encodes a human TANGO 
189 protein that is predicted to be a transmembrane 
protein having a 24 or 25 amino acid signal sequence 

25 (amino acids 1- 24 or 1-25 of SEQ id NO: 31; SEQ ID NO: 72 
and SEQ id NO: 73) followed by a 227 or 226 amino acid 
mature protein (amino acids 25 - 251 or 26 - 251 of SEQ 
ID NO: 31; SEQ ID NO: 84 and SEQ ID NO: 85) having a first 
extracellular domain of 114 or 115 amino acids (amino 

30 acids 25 - 138 or 26 - 138 of SEQ ID NO: 31; SEQ ID NO: 92 
and SEQ ID NO: 93), followed by a first transmembrane 
domain (amino acids 139 - 164 of SEQ ID NO: 31; SEQ ID 
NO: 99), a first cytoplasmic domain (amino acids 165 - 177 
of SEQ ID N0:31; SEQ ID NO:106), a second treinsmembreme 

35 domain (amino acids 178 - 195 of SEQ ID NO: 31; SEQ ID 
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NO: 100), a second extracellular domain (amino acids 196 - 
211 of SEQ ID N0:31; SEQ ID NO:108) , a third 
transmembrane domain (amino acids 212 - 237 of SEQ ID 
NO: 31; SEQ ID NO: 101) , and a second cytoplasmic domain 
5 (amino acids 238 - 253 of SEQ ID N0:31; SEQ ID N0:107) • 
The protein encoded by this 2.1 kb TANGO 189 transcript 
is predicted to have a molecular weight of 21.8 kDa prior 
to cleavage of its signal peptide and a molecular weight 
of 25.2 kDa subsequent to cleavage of its signal peptide. 

10 The predicted domain structure of the protein encoded 
splice variant lA is identical to that of the protein 
encoded by the 2.1 kb transcript up to amino acid 181. 
The predicted domain structure of the protein encoded 
splice variant IB is identical to that of the protein 

15 encoded by the 2.1 JdD transcript up to amino acid 180. 
The murine TANGO 189 cDNA of SEQ ID NO: 42 has a 759 
nucleotide open reading frame (SEQ ID NO: 52) encoding a 
253 amino acid protein (SEQ ID NO: 62) . The cDNA and 
protein sequences of murine TANGO 189 are shown in Figxire 

20 18. 

Figure 30 depicts an alignment of the predicted amino 
acids sequences of hximan (SEQ ID NO: 31; splice v€u:iant 
lA) and murine (SEQ ID NO: 62) TANGO 189 (91.7% idenity) . 
Figure 40 depicts an alignment of the cDNA sequences of 

25 human (SEQ ID NO: 9; splice variant lA) and murine (SEQ ID 
N0:42) TANGO 189 (51.8% identity). 

Northern analysis of human TANGO 189 mRNA egression 
revealed the presence of one major transcript (2.1 kb) 
and four minor transcripts (3.4. kb, 4.2kb, 6 kb, and 7 

30 kb) - The 2.1 kB transcript is expressed at a high level 
in brain, spinal cord, and testis; expressed at a low 
level in heart, placenta, skeletal muscle, kidney, 
pancreas, lung, thyroid, lymph node, trachea, adrenal, 
bone marrow, spleen, ovary, and prostate; and ejq>ressed 

35 at a very low level in liver, stomach, thymus, small 
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intestine, colon, peripheral blood lynphocytes. The 
3.4. kb, 4*2 kb, 6 kb, and 7 kb transcripts are e^cpressed 
at a moderate level in brain and spinal cord; and are not 
expressed in testis. The 4.6 and 7 kb transcripts are 
5 expressed at a moderate level in peripheral blood 
lynphocytes . 

Murine in situ expression analysis revealed that TANGO 
189 is eaqpressed strongly and almost ubiquitously 
expressed in the mouse embryo. Tissues with the highest 

10 expreession during embryogenesis are the brain, spinal 
chord, and small intestine. Expression decreases in most 
if not all tissues by postnatal day 1.5 but tissues of 
highest expression remain the brain, spinal chord, and 
small intestine. This pattern continues into the adult 

15 mouse with expression in most tissues decreasing even 
more, some to background levels. Of the adult tissue 
tested, the brain, spleen, small intestine, and retina, 
have the highest signal. High level expression is 
observed in the folowing adult tissues: placenta 

20 (ubiquitous) , small intestine (except villi) , eye 
(retina) , brain (ubiquitous) . Lower expression is 
observed in: bladder (stronger signal in the transitional 
epithelium) , kidney, thymus, liver, placenta, spleen, and 
colon. Expression was not observed in: heart, skeletal 

25 muscle, diaphragm, lung, and pancreas. Embryonic 

eaqpresion was observed at stages E13.5 through E17.5 
(high ubiquitous signal, brain, spinal chord, small 
intestine have the strongest signal) auid'Pl-S (ubiquitous 
signal decreased in intensity, brain, spinal chord, small 

30 intestine, and kidney have the strongest signal) . 

TANGO 189 is useful as a tissue-specific marker. The 
expression of TANGO 189 may be altered in a variety of 
disease states (e.g., cancer). Thus, TANGO 189 nucleic 
acid molecules and polypeptides as well as anti -TANGO 189 
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antibodies and modulators of TANGO 189 disorders cell 
proliferation and differentiation. 

TANGO 215 

The human TANGO 215 cDNA of SEQ ID NO: 10 has a 2160 
5 nucleotide open reading frame (SEQ ID NO: 21) encoding a 
720 amino acid protein (SEQ ID NO: 32), The cDNA and 
protein sequences of human TANGO 215 are shown in Figure 
19. 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

10 acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 181 clone are shown in Figure 56. 

Human TANGO 215 is predicted to be a wholly secreted 
protein having a 21 amino acid signal sequence (amino 
acids 1 - 21 of SEQ ID NO: 32; SEQ ID NO: 74) followed by a 
15 699 amino acid mature protein (amino acids 22 - 720 of 
SEQ ID NO:32; SEQ ID N0:86) , TANGO 215 is predicted to 
have a molecular weight of 80.3 kDa prior to cleavage of 
its signal peptide and a molecular weight of 77.6 kDa 
subsequent to cleavage of its signal peptide. 
20 TANGO 215 is related to Clr/Cls (Clq) and MASP1/MASP2 
(mannose-binding lectin-associated serine protease) 
proteases, all of which are involved in the alternative 
pathway pathway of immune response. 

TANGO 215 may be a theronine protease* There is a 
25 threonine in the sequence TGG at amino acid 664-666 of 
human and murine TANGO 215. This sequence is within a 
region having similarity to the active site of certain 
proteases. Human TANGO 215 is predicted to have CUB 

domain (amino acids 128 - 236 of SEQ ID NO:32) , an EGF 
30 domain (amino acids 239 - 271 of SEQ ID NO:32) , a small 
consensus repeat (SCR) domain (amino acids 280 - 342 of 
SEQ ID NO: 32), a partial SCR domain (amino acids 408 - 



wo 00/18904 



PCT/US99/22817 



- 38 - 

442 of SEQ ID NO: 32), and a serine protease domain (amino 
acids 461 - 720 of SEQ ID NO:32) . 

Northern analysis of hiiman TANGO 215 wRNA expression 
revealed the presence of a 2,7 IdD transcript in heart, 
5 brain, and placenta* 

In situ analysis of TANGO 215 expression in adult mice 
revealed e^^ression in the brain (cortex and caudate 
putamen) , kidney (cortex, most likely within the 
glomeruli) , bladder (ubiquitous expression) , liver 

10 (possibly within vessels) , and placenta (outer membrane 
region) . This analysis did not detect expression in the 
lung, small intestine, pancreas, thymus, eye, heart, or 
muscle/diaphragm. 

In situ analysis of TANGO 215 in embryos revealed 

15 expression at B13.5 in developing linibs and vertebrae. 
At E14.5 the observed expression pattern was similar to 
that at E13.5 except that expression was observed in the 
muscle surrounding abdomen, the skin, and the jaw* At 
E15,5 expression was observed in the developing kidney 

20 and bladder and outer layer of the tongue. At later 
ages, E16.5 through PI. 5, expression is observed in the 
smooth muscle layer of the small intestine, the portal 
regions of the liver, and the large airways of the lungs. 
Expression in the brain is absent until B18.5 when 

25 esi^ression is apparent in the caudate putamen. 

Bsqpression remains strong at PI, 5 in the vertebrae, tail, 
and sternum and possibly the muscle between developing 
bones . 

The region of human TANGO 215 from amino acid 280 to 
30 the end is predicted to be the human homologue of Limilus 
Factor C (27% identity) . Thus, this region of TANGO 215 
is predicted to include an effector domain (serine 
protease domain) and, perhaps, an LPS sensing domain. 
Thus, TANGO 215 may sense and respond to LPS with the 
35 response to the presence of LPS being activation of 
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serine protease activity. Accordingly, TANGO 215 nucleic 
acids and polypeptides as well as antibodies directed 
against TANGO 215 and modulators of TANGO 215 expression 
or activity may be useful in the diagnosis and treatment 
5 sepsis . 

CUB domains are extracellular domains of about 110 
amino acids. CUB domains are found in functionally 
diverse, mostly developmental ly regulated proteins. Most 
contain four cysteines that are involved in two disulfide 

10 bonds (C1-C2 and C3-C4) . SCR domains are also known as 
complement control protein (CCP) modules. BGP domains 
are commonly involved in receptor- ligand interactions. 
CUB, BGF, and SCR domains are commonly involved in 
protein-protein interaction. Because these domains are 

15 present in TANGO 215, it is predicted to interact with 
one or more other proteins. The presence of these 
domains in TANGO 215 suggests that TANGO 215 is involved 
in development, perhaps bone and cartilage morphogenesis. 
TANGO 215 nucleic acid molecules and polypeptides as well 

20 as anti-TANGO 215 antibodies and modulators of TANGO 215 
expression or activity may be useful in the treatment of 
developmental disorders. 
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TANGO 187 

The human TANGO 187-1/3 cDNA of SEQ ID NO: 11 has a 1032 
nucleotide open reading frame (SEQ ID NO: 22) encoding a 
343 amino acid protein (SEQ ID NO: 33) • The cDNA and 
5 protein sequences of human TANGO 187-1/3 are shown in 
Figure 20. 

Human TANGO 187-1/3 is predicted to be a wholly 
secreted protein having a 20 amino acid signal sequence 
(amino acids 1 - 20 of SEQ ID NO: 33; SEQ ID NO: 75) 

10 followed by a 323 amino acid mature protein (amino acids 
21 - 343 of SEQ ID NO:33; SEQ ID NO:87) • Hximan TANGO 
187-1/3 is predicted to have a molecular weight of 37.5 
kDa prior to cleavage of its signal peptide and a 
molecular weight of 35.9 kDa subsequent to cleavage of 

15 its signal peptide. 

The TANGO 187-1/3 cDNA described upon actually 
represents one of 8 different TANGO 187 splice variants. 
Each variant contains none, one, two or three of three 
variant regions. These regions are referred to as region 

20 1, region 2, and region 3, and each of the various forms 
of TANGO 187 is referred to by including a reference to 
the variant regions present. Thus, the form of TANGO 187 
described above is TANGO 187-1/3 because it includes 
regions 1 and 3. 

25 Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID N0:_) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

30 187-2/3. 

Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2/3. 
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Figxire 49 depicts the cDK^ sequence (SBQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2. 

Figure 50 depicts the cDNA sequence (SEQ ID NO:^ ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

Figure 51 depicts the cDNA sequence (SEQ ID NO; ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3. 

10 Figure 52 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187. This form does not include any of the three variant 
regions . 

The murine TANGO 187 cDNA of SEQ ID NO: 43 is only a 

15 partial sequence. This cDNA has an open reading frame 
extending from nucleotide 73 to the end of the available 
sequence (SEQ ID NO: 53) encoding a 152 amino acid protein 
(SEQ ID NO: 63) . The partial cDNA euid protein secpiences 
of murine TANGO 187 are shown in Figure 21 • 

20 Figure 31 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 33) and murine (SEQ ID 
NO: 63; partial) TANGO 187 (50.4% identity). Figure 41 
depicts an alignment of the cDNA sequences of human (SEQ 
ID N0:11) and murine (SEQ ID NO:43; partial) TANGO 187 

25 (66.0% identity) . 

Northern analysis of human TANGO 187 mRNA expression 
revealed the presence of 1.3 and 2.4 Jcb transcripts that 
are approximately equally expressed at a low level in 
heart, brain, lung, liver, and smooth muscle and at a 

30 moderate level in kidney and placenta. 

In aitu analysis of TANGO 187 expression in adult mice 
revealed that TANGO 187 is expressed in brain (weak, 
ubiquitous signal) , eye and harderian gland (weak signal 
in the retina) , submandibular gland (weak, ubiquitous 

35 signal), stomach (weak, ubiquitous signal), kidney (weak, 
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ubiquitous signal) r adrenal gland (low level, ubiquitous 
expression), colon (low level, ubiquitous expression), 
small intestine (low level, ubiquitous expression), 
thymus (moderate level, ubiquitous expression in the 
5 cortical region with lower expression in the medulla) , 
lymph node (ubiquitous expression) , spleen (low level 
lobiquitous expression with lower expression in the 
follicles, bladder (moderate expression in the mucosal 
epithelium) , testes (moderate, ubiquitous expression 

10 signal that defines the seminiferous vesicles) * In this 
analysis, TANGO 187 esqpression was not detectable in the 
spinal cord, brown fat, heart, lung, liver, pancreas, 
skeletal muscle, and ovaries. 

In situ analysis of TANGO 187 e3q>ression in embryos at 

15 E13.5 revealed vibiquitous expression with the strongest 
expression in the brain and spinal cord. A punctate 
expression pattern was observed in the lungs suggestive 
of higher expression in the developing large airways. At 
E14.5 the expression pattern was similar to that observed 

20 at E13.5 except that expression was observed in the 
developing olfactory system and the eye at a level 
similar to that observed in the brain and spinal cord. 
Expression is also present at E14.5 in the epithelium of 
the tongue, the dermis of the snout, the kidneys and the 

25 stomach. At E15.5 low level ubiquitous expression was 
observed with the highest expression in the brain, spinal 
cord, eye, and olfactory system. Slightly lower 
expression was observed in the lung (ubiquitous 
expression) and kidney (cortical region) than in the 

30 aforementioned neuronal tissues. At E16,5 the observed 
expression pattern is identical to that seen at E15.5 
except TANGO 187 expression is observed in the thymus and 
the mucosal portion of the stomach. At E18.5 TANGO 187 
continues to be highest in neuronal tissue with lower 

35 expression in the hind brain and spinal cord than in the 
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forebrain with the neopallial cortex having the highest 
signal. At E16.5 esqpression is observed in the thymus 
and small intestine. At PI. 5 the observed expression 
pattern is nearly identical to that at E18.5 except that 
5 expression in the the limg and stomach has decreased. At 
PI. 5 expression is highest in the brain, eye, olfact03:y 
epithelium and kidney. 

Tango 187 contain a region moderately similar to an 
armadillo/beta-catenin repeat. Such repeats are thought 
10 to be involved in protein-protein interactions. 
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TABLE 1: Summary of Htmian TANGO 180, TANGO 181, TANGO 
182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and 
TANGO 215 Sequence Information. 
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TABLE 2: Sunvnary of Domains of Hxunan TANGO 180, TANGO 



181, TANGO 182, TANGO 183. TANGO 184, TANGO 
185, TANGO 186, TANGO 187, TANGO 188, TANGO 
189, and TANGO 215. 



Protein 


Signal 
Sequence 


Mature 
Protein 


Bxtracellula 
r 

Domain 


Transmembran 
e 

Domain 


Cytoplasmic 
Domain 


TANGO 180 


aa 1-22 
SEQ ID 
NO:64 


aa 23-189 
SEQ ID 
IK>:76 








TANGO 181 


aa 1*22 
SEQ ID 
NO: 65 


aa 23-339 
SEQ ID 
N0:77 








TANGO 182 


aa 1-23 
SEQ ID 
N0:66 


aa 24-348 
SEQ ID 
H0i78 








TANGO 183 


aa 1-20 
SEQ ID 

N0:67 


aa 21-183 
SEQ ID 
N0:79 


aa 21-89 
SEQ ID NO: 88 


aa 90-112 
SEQ ID NO: 94 


aa 113-183 
SBQ ID 
N0tl02 


TANGO 184 


aa 1-28 
SEQ ID 
N0:68 


aa 29-198 
SEQ ID 
N0:80 


aa 29-102 
SEQ ID N0t89 


aa 103-125 
SEQ ID NO: 95 


aa 126-198 
SEQ ID 
NOil03 


TANGO 185 


aa 1-24 
SEQ ID 
MO: 69 


aa 25-193 
SEQ ID 
NOtSl 


aa 25-75 
SEQ ID NO: 90 

and 

aa 131-150 

SEQ ID NO: 91 


aa 76-102 
SEQ ID NO: 96 

and 

aa 110-131 

SEQ ID NOi97 
and 

aa 151-174 
SEQ ID NO: 98 


aa 103-109 
SEQ ID 
NO: 104 
and 

aa 175-193 
SEQ ID 
NO: 105 


TANGO 186 


aa 1-20 
SEQ ID 
N0:70 


aa 21-383 
SEQ ID 
NO:82 








TANGO 188 


aa 1-23 
SEQ ID 
N0:71 


aa 24-264 
SEQ ID 
NO: 83 
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TANGO 189 


aa 1-24 


aa 25-251 


aa 25-138 


aa 139-164 


aa 165- 


177 




SEQ ID 


SEQ ID 


SBQ ID NO: 92 


SEQ ID NO: 99 


SBQ ID 






NO: 72 


NO: 84 


or 


and 


NO: 106 






or 


or 


aa 26-138 


aa 178-195 


and 






aa 1-25 


aa 26-251 


SEQ ID NO: 93 


SEQ ID 


aa 238- 


253 




SEQ ID 


SEQ ID 


and 


NO: 100 


SEQ ID 






N0i73 


NO:65 


aa 196-211 


and 


NO 1 107 










SBQ ID 


aa 212-237 












NO: 108 


SBQ ID 














NO; 101 






TMiGO 215 


aa 1-21 


aa 22-720 












SEQ ID 


SEQ ID 












NO: 74 


NOt86 










TANGO 


aa 1-20 


aa 21-343 










187-1/3 


SEQ ID 


SEQ ID 












N0j75 


NOs87 
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TABLE 3: Summary of Murine TANGO 180, TANGO 180, TANGO 
181, TANGO 182, TANGO 183, TANGO 184, TANGO 
185, TANGO 186, TANGO 188, TANGO 189, and 
TANGO 187 Sequence Information 



5 


Gene 


CDNA 


ORF 


Protein 


Figure 


AA align, 
with human 


NA 

align, 
with 

human 




TAMQO 
180 


SEQ ID 
N0i34 


SEQ ID 
NOi44 


SEQ ID 
NOs54 


Pig. 2 


Fig. 22 


Fig. 32 


10 


TANGO 
181 

(part la 
1) 


SEQ ID 
N0:35 


SEQ ID 
NO: 45 


SEQ ID 
NO: 55 


Pig. 4 


Fig. 23 


Fig- 33 


15 


TANGO 
182 

(partla 
1) 


SEQ ID 
NO:36 


SBQ ID 
NO:46 


SEQ ID 
N0t56 


Fig. 6 


Fig* 24 


Fig . 34 




TANGO 
183 


SEQ ID 
N0:37 


SBQ ID 
NO:47 


SEQ ID 
NOs57 


Fig. 8 


Fig. 25 


Fig. 35 




TANGO 
184 


SEQ ID 
NO:38 


SEQ ID 
NO:48 


SEQ ID 
NO: 58 


Pig. 10 


Fig. 26 


Fig. 36 




185 


NO)39 


SEQ ID 
N0:4d 


SEQ ID 
NOs59 


Fig . 12 


Pia . 27 


Fig. 37 




TANGO 
186 


SBQ ID 
NOt40 


SBQ ID 
NOtSO 


SEQ ID 
NO: 60 


Fig. 14 


Fig* 28 


Fig. 38 


25 


TANGO 
188 


SEQ ID 
N0t41 


SEQ ID 
NO: 51 


SBQ ID 
N0:61 


Pig. 16 


Fig. 29 


Fig. 39 




TANGO 

189 


SBQ ID 

NO; 42 


SBQ ID 

NO: 52 


SEQ ID 

NO: 62 


Pig. 18 


Fig. 30 


Pig. 40 


30 


TANGO 
187 

(partla 
1) 


SEQ ID 
NOi43 


SEQ ID 
NO: 53 


SEQ ID 
NOs63 


Fig. 21 


Fig. 31 


Fig. 41 
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TABGO 
181 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Pig, 53 






TANGO 

182 


SEQ ID 

NO: 


SEQ ID 

NO: 


SEQ ID 

NO: 


Pig. 54 






TANGO 
187 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Pig. 55 






TANGO 
215 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Pig. 56 







Various aspects of the invention are described in 
10 further detail in the following siibsections 

I, Isolated Nucleic Acid Molecules 

One aspect of the invention pertains to isolated 
nucleic acid tnolecules that encode a polypeptide of the 
invention or a biologically active portion thereof, as 

15 well as nucleic acid molecules sufficient for use as 
hybridization probes to identify nucleic acid molecules 
encoding a polypeptide of the invention and fragments of 
such nucleic acid molecules suitable for use as PGR 
primers for the amplification or mutation of nucleic acid 

20 molecules. As used herein, the term "nucleic acid 
molecule** is intended to include DNA molecules (e.g., 
cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and 
analogs of the DNA or HNA generated using nucleotide 
analogs. The nucleic acid molecule can be single- 

25 stremded or double -streUided, but preferably is doubles- 
stranded DNA. 

An "isolated" nucleic acid molecule is one which is 
separated from other nucleic acid molecules which are 
present in the natural source of the nucleic acid 
30 molecule- Preferably, an "isolated" nucleic acid 
molecule is free of sequences (preferably protein 
encoding sequences) which naturally flank the nucleic 
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acid (i.e., sequences located at the 5' and 3' ends of 
the nucleic acid) in the genomic DNA of the organism from 
which the nucleic acid is derived. For exanple, in 
various embodiments, the isolated nucleic acid molecule 
5 can contain less than about 5 kB, 4 JcB, 3 kB, 2 kB, 1 kB, 
0,5 kB or 0,1 kB of nucleotide sequences which naturally 
flank the nucleic acid molecule in genomic DNA. of the 
cell from which the nucleic acid is derived. Moreover, 
an "isolated" nucleic acid molecule, such as a cDNA 

10 molecule, can be substantially free of other cellular 
material, or culture medium when produced by recombinant 
techniques, or substantially free of chemical precursors 
or other chemicals when chemically synthesized* 

A nucleic acid molecule of the present invention, e.g., 

15 a nucleic acid molecule having the nucleotide sequence of 

any of SEQ ID Nos:l-22, 34-43, and - or the cDNA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001, or a complement thereof, can be isolated using 
standard molecular biology techniques and the sequence 

20 information provided herein. Using all or a portion of 
the nucleic acid sequences of any of SEQ ID NOs:l-22, 34- 

43, and - or the cDNA of a clone deposited as any 

of ATCC 98899, 98900, and 989001 as a hybridization 
probe, nucleic acid molecules of the invention can be 

25 isolated using standard hybridization and cloning 

techniques (e,g., as described in Sambrook et al., eds,. 
Molecular Cloning: A Laboratory Manual, 2nd ed.. Cold 
Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989) . 

30 A nucleic acid molecule of the invention can be 

amplified using cDNA, mRNA or genomic DNA as a template 
and appropriate oligonucleotide primers according to 
standard PCR amplification techniques. The nucleic acid 
so a^lified can be cloned into an appropriate vector and 

35 characterized by DNA sequence analysis. Furthermore, 
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oligonucleotides corresponding to all or a portion of a 
nucleic acid molecule of the invention can be prepared by 
standard synthetic techniques, e.g«, using an automated 
DNA synthesizer. 
5 In another preferred embodiment, an isolated nucleic 
acid molecule of the invention comprises a nucleic acid 
molecule which is a complement of the nucleotide sequence 

shown in SEQ ID N0s:l-22, 34-43, and - or the 

cDNA of a clone deposited as ATCC 98899, 98900, and 

10 989001, or a portion thereof* A nucleic acid molecule 
which is complementary to a given nucleotide sequence is 
one which is sufficiently complementary to the given 
nucleotide sequence that it can hybridize to the given 
nucleotide sequence thereby forming a stable duplex. 

15 Moreover, a nucleic acid molecule of the Invention can 
comprise only a portion of a nucleic acid sequence 
encoding a full length polypeptide of the invention for 
example, a fragment which can be used as a probe or 
primer or a fragment encoding a biologically active 

20 portion of a polypeptide of the invention ♦ The nucleotide 
sequence determined from the cloning one gene allows for 
the generation of probes and primers designed for use in 
identifying and/or cloning homologues in other cell 
types, e.g., from other tissues, as well as homologues 

25 from other mammals « The probe/primer typically comprises 
substantially purified oligonucleotide. The 
oligonucleotide typically comprises a region of 
nucleotide sequence that hybridizes under stringent 
conditions to at least about 12, preferably about 25, 

30 more preferably about 50, 75, 100, 125, 150, 175, 200, 

250, 300, 350 or 400 consecutive nucleotides of the sense 
or anti-sense sequence of any of SEQ ID N06:l-22, 34-43, 

and - or the cDNA of a clone deposited as ATCC 

98899, 98900, and 989001 or of a naturally occurring 

35 mutant of any of SEQ N0s:l-22, 34-43, and - or 
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the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001. 

Probes based on the sequence of a nucleic acid molecule 
of the invention can be used to detect transcripts or 
5 genomic sequences encoding the same protein molecule 
encoded by a selected nucleic acid molecule. The probe 
comprises a label group attached thereto, e.g., a 
radioisotope, a fluorescent compound, an enzyme, or an 
enzyme co-factor. Such probes can be used as part of a 

10 diagnostic test kit for identifying cells or tissues 
which mis-e3q)ress the protein, such as by measuring 
levels of a nucleic acid molecule encoding the protein in 
a sample of cells from a subject, e.g., detecting mRNA 
levels or determining whether a gene encoding the protein 

15 has been mutated or deleted. 

A nucleic acid fragment encoding a ^biologically active 
portion" of a polypeptide of the invention can be 
prepared by isolating a portion of any of SEQ ID NOs:l- 
22, 34-43, and - or the nucleotide sequence of 

20 the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001 which encodes a polypeptide having a biological 
activity, expressing the encoded portion of the 
polypeptide protein (e.g., by recombinauit expression in 
vitro) and assessing the activity of the encoded portion 

25 of the polypeptide. 

The invention further encompasses nucleic acid 
molecules that differ from the nucleotide sequence of SEQ 

ID NOs:l-22, 34-43, and - or the cDNA of a clone 

of ATCC 98899, 98900, and 989001 due to degeneracy of the 

30 genetic code and thus encode the same protein as that 

encoded by the nucleotide sequence shown in any of SEQ ID 

NOs:l-22, 34-43 , and - or the cDNA of a clone 

deposited as ATCC 98899, 98900, and 989001. 

In addition to the nucleotide sequences shown in SEQ ID 

35 N0s:l-22, 34-43, and - and present in cDNA's of 
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the clones deposited of ATCC 98899, 98900, and 989001, it 
will be appreciated by those skilled in the art that DNA 
secjuence polymorphisms that lead to changes in the amino 
acid sequence may exist within a population (e.g*, the 
5 human population) . Such genetic polymorphisms may exist 
among individuals within a population due to natural 
allelic variation. An allele is one of a group of genes 
which occur alternatively at a given genetic locus. As 
used herein, the phrase ^allelic variant" refers to a 

10 nucleotide sequence which occurs at a given locus or to a 
polypeptide encoded by the nucleotide sequence. As used 
herein, the tesrms "gene" and "recombinant gene" refer to 
nucleic acid molecules comprising an open reading frame 
encoding a polypeptide of the invention. Such natural 

15 allelic variations can typically result in 1-5% variance 
in the nucleotide sequence of a given gene. Alternative 
alleles can be identified by sequencing the gene of 
interest in a number of different individuals. This Ccin 
be readily carried out by using hybridization probes to 

20 identify the same genetic locus in a variety of 

individuals. Any and all such nucleotide variations and 
resulting amino acid polymorphisms or variations that are 
the result of natural allelic variation and that do not 
alter the functional activity are intended to be within 

25 the scope of the invention. 

Moreover, nucleic acid molecules encoding proteins of 
the invention from other species (homologues) , which have 
a nucleotide sequence which differs from that of the 
human protein described herein are intended to be within 

30 the scope of the invention. Nucleic acid molecules 

corresponding to natural allelic variants and homologues 
of a cDNA of the invention can be isolated based on their 
identity to the humcui nucleic acid molecule disclosed 
herein using the human cDNAs, or a portion thereof, as a 

35 hybridization probe according to standard hybridization 
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techniques under stringent hybridization conditions* For 
example, a cDNA encoding a solxible form of a menibrane- 
bound protein of the invention isolated based on its 
hybridization to a nucleic acid molecule encoding all or 
5 part of the membrane -boiind form. Likewise, a cDNA 
encoding a membrane -bound form can be isolated based on 
its hybridization to a nucleic acid molecule encoding all 
or part of the solxjJDle form. 

Accordingly, in another embodiment, an isolated nucleic 

10 acid molecule of the invention is at least 300 (325, 350, 
375, 400, 425, 450, 500, 550, 600, 650, 700, 800, 900, 
1000, or 1290) nucleotides in length and hybridizes under 
stringent conditions to the nucleic acid molecule 
comprising the nucleotide sequence, preferably the coding 

15 sequence, of any of SEQ ID NOs:l-22, 34-43, and - 

the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001, or a conplement thereof. 

As used herein, the term "hybridizes xinder stringent 
conditions" is intended to describe conditions for 

20 hybridization and washing under which nucleotide 
sequences at least 60% (65%, 70%, preferably 75%) 
identical to each other typically remain hybridized to 
each other. Such stringent conditions are known to those 
skilled in the art and can be found in Current ProtocolB 

25 in Molecular Biology, John Wiley & Sons, N.Y. (1989), 
6-3, 1-6.3. 6. A preferred, non-limiting example of 
stringent hybridization conditions are hybridization in 
6X sodium chloride/sodium citrate (SSC) at about 45'*C, 
followed by one or more washes in 0.2 X SSC, 0.1% SDS at 

30 50-65**C- Preferably, an isolated nucleic acid molecule 
of the invention that hybridizes imder stringent 
conditions to the sequence of any of SEQ ID NOs:l-22, 34- 

43, and - , the cDNA of ATCC 98899, 98900, and 

989001, or the cotrqplement thereof, corresponds to a 

35 naturally-occurring nucleic acid molecule. As used 
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herein, a "naturally-occurring" nucleic acid molecule 
refers to an RNA or DNA molecule having a nucleotide 
sequence that occurs in nature (e.g., encodes a natural 
protein) . 

5 In addition to naturally-occurring allelic variants of 
a nucleic acid molecule of the invention sequence that 
may exist in the population, the skilled artisan will 
further appreciate that changes can be introduced by 
mutation thereby leading to changes in the amino acid 

10 sequence of the encoded protein, without altering the 

biological activity of the protein. For example, one can 
make nucleotide stibstitutions leading to amino acid 
substitutions at "non-essential" amino acid residues. A 
"non-essential" amino acid residue is a residue that ceui 

15 be altered from the wild-type sequence without altering 
the biological activity, whereas an "essential" amino 
acid residue is required for biological activity* For 
example, amino acid residues that are not conserved or 
only semi -conserved among homologues of various species 

20 may be non-essential for activity and thus would be 

likely targets for alteration. Alternatively, amino acid 
residues that are conserved among the homologues of 
various species (e»g., murine and human) may be essential 
for activity and thus would not be likely tairgets for 

25 alteration. Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Accordingly, another aspect of the invention 
pertains to nucleic acid molecules encoding a polypeptide 

30 of the invention that contain changes in amino acid 
residues that are not essential for activity. Such 
polypeptides differ in amino acid sequence from SEQ ID 

N08:23-33, 54-63, and - yet retain biological 

activity. In one embodiment, the isolated nucleic acid 

35 molecule includes a nucleotide sequence encoding a 
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protein that includes an amino acid sequence that is at 
least about 45% identical, 65%, 75%, 85%, 95%, or 98% 
identical to the amino acid sequence of any of SEQ ID 

Nos:23-3, 54-63, and - . 

5 An isolated nucleic acid molecule encoding a variant 
protein can be created by introducing one or more 
nucleotide substitutions, additions or deletions into the 

nucleotide sequence of SEQ ID NOs:l-22, 34-43, and - 

the cDNA of a clone deposited of ATCC 98899, 98900, 

10 and 989001 such that one or more amino acid 

substitutions, additions or deletions are introduced into 
the encoded protein. Mutations can be introduced by 
standard techniques, such as site -directed mutagenesis 
and PCR^-mediated mutagenesis. Preterahly, conservative 

15 amino acid substitutions are made at one or more 
predicted non-essential amino acid residues. A 
"conservative amino acid substitution" is one in which 
the amino acid residue is replaced with an amino acid 
residue having a similar side chain. Families of amino 

20 acid residues having similar side chains have been 

defined in the art. These families include amino acids 
with basic side chains (e.g., lysine, arginine, 
histidine) , acidic side chains (e.g., aspasrtic acid, 
glutamic acid), uncharged polar side chains (e.g., 

25 glycine, asparagine, glutamine, serine, threonine, 

tyrosine, cysteine), nonpolar side chains (e.g., alanine, 
valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan), beta-branched side chains (e.g,, 
threonine, valine, isoleucine) and aromatic side chains 

30 (e.g., tyrosine, phenylalanine, tryptophan, histidine). 
Alternatively, mutations can be introduced randomly along 
all or part of the coding sequence, such as by 
saturation mutagenesis, and the resultant mutants can be 
screened for biological activity to identify mutants that 

35 retain activity. Following mutagenesis, the encoded 
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protein can be expressed recombinantly and the activity 
of the protein can be determined. 

In a preferred embodiment, a mutant polypeptide that is 
a variant of a polypeptide of the invention can be 
5 assayed for: (1) the ability to form protein: protein 

interactions with proteins in a signalling pathway of the 
polypeptide of the invention; (2) the ability to bind a 
ligand of the polypeptide of the invention; or (3) the 
ability to bind to an intracellular target protein of the 

10 polypeptide of the invention • In yet another preferred 
embodiment, the mutant polypeptide can be assayed for the 
ability to modulate cellular proliferation or cellular 
differentiation . 

The present invention encompasses antisense nucleic 

15 acid molecules, i.e., molecules which are complementary 
to a sense nucleic acid encoding a polypeptide of the 
invention, e.g., complementary to the coding strand of a 
double -stranded cDNA molecule or complementary to an mRNA. 
sequence. Accordingly, an antisense nucleic acid can 

20 hydrogen bond to a sense nucleic acid. The antisense 
nucleic acid can be conplementary to an entire coding 
strand, or to only a portion thereof, e.g., all or part 
of the protein coding region (or open reading frame) . An 
antisense nucleic acid molecule can be antisense to all 

25 or part of a noncoding region of the coding strand of a 
nucleotide sequence encoding a polypeptide of the 
invention. The noncoding regions ('•5' and 3' 
untranslated regions") are the 5' and 3' sequences which 
flank the coding region and are not translated into amino 

30 acids. 

An antisense oligonucleotide can be, for example, 
about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides 
in length. An antisense nucleic acid of the invention 
can be constructed using chemical synthesis and enzymatic 
35 ligation reactions using procedures known in the art. 
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For example, an antisense nucleic acid (e.g., an 
antisense oligonucleotide) can be chemically synthesized 
using naturally occurring nucleotides or variously 
modified nucleotides designed to increase the biological 
5 stability of the molecules or to increase the physical 
stability of the duplex formed between the antisense and 
sense nucleic acids, e.g., phosphorothioate derivatives 
and acridine substituted nucleotides can be used. 
Examples of modified nucleotides which can be used to 

10 generate the cuitisense nucleic acid include 5- 
fluorouracil, 5-bromouracil, 5-chlorouracil, 5- 
iodouracili hypoxemthine, xanthine, 4 -acetyl cytosine, 5- 
( carboxyhydroxylmethyl ) uracil , 5 - 
carboxymethylaminomethyl - 2 - thiotiridine , 5 - 

15 carboxymethylaminomethyluracil, dihydrouracil , beta-D- 
galactosylgueosine , inosine , N6 - isopentenyladenine , 1 - 
methyl guanine , 1 -methyl inosine , 2,2 -dimethyl guanine , 2 - 
methyl adenine, 2-methylguanine, 3 -methyl cytosine, 5- 
methylcytosine, N6 -adenine, 7-methylguanine, 5- 

20 methylaminomethyluracil , 5-methoxyaminomethyl-2 - 
thiouracil , bet a-D-mannosylqueosine , 5 ' - 
methoxycarboxymethyluracil , 5-methoxyuracil , 2 - 
methyl thio-N6- isopentenyladenine, uracil-5-oxyacetic acid 
(v) , wybutoxosine, pseudouracil , queosine, 2- 

25 thiocytosine, 5-methyl -2 -thiouracil, 2-thiouracil, 4- 
thiouracil, 5 -methyl uracil, uracil-5-oxyacetic acid . 
methylester, uracil- 5 -oxyacetic acid (v) , 5-methyl-2- 
thiouracil, 3- (3-amino-3-N-2-carboxypropyl) uracil, 
(acp3)w, and 2, 6-diaminopurine. Alternatively, the 

30 antisense nucleic acid can be produced biologically using 
an expression vector into which a nucleic acid has been 
subcloned in an antisense orientation (i.e., RNA 
transcribed from the inserted nucleic acid will be of an 
antisense orientation to a target nucleic acid of 

35 interest, described further in the following subsection) . 
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The antisense nucleic acid moleculed of the invention 
are typically administered to a subject or generated in 
situ such that they hybridize with or bind to cellular 
mRNA and/or genomic DNA encoding a selected polypeptide 
5 of the invention to thereby inhibit expression, e.g., by 
inhibiting transcription and/or translation. The 
hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, 
in the case of an antisense nucleic acid molecule which 

10 binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a 
route of administration of antisense nucleic acid 
molecules of the invention includes direct injection at a 
tissue site. Alternatively, antisense nucleic acid 

15 molecules can be modified to target selected cells and 
then administered systemically. For example, for 
systemic administration, antisense molecules can be 
modified such that they specifically bind to receptors or 
antigens expressed on a selected cell surface, e.g., by 

20 linking the emtisense nucleic acid molecules to peptides 
or antibodies which bind to cell surface receptors or 
antigens. The antisense nucleic acid molecules can also 
be delivered to cells using the vectors described herein. 
To achieve sufficient intracellular concentrations of the 

25 antisense molecules, vector constructs in which the 
antisense nucleic acid molecule is placed under the 
control of a strong pol II or pol III promoter are 
preferred. 

An antisense nucleic acid molecule of the invention can 
30 be an a-anomeric nucleic acid molecule • An a-anomeric 
nucleic acid molecule forms specific double- stranded 
hybrids with conplementary RNA in which, contrary to the 
usual /3-unit8, the strands run parallel to each other 
(Gaultier et al. (1987) Nucleic Acids Res. 15:6625-6641). 
35 The antisense nucleic acid molecule can also comprise a 
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2 ' -o-methylribonucleotide (Inoue et al. (1987) Nucleic 
Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue 
(Inoue et al. (1987) FEBS Lett. 215:327-330), 

The invention also encompasses ribozymes. Ribozymes 
5 are catalytic RNA molecules with ribonuclease activity 
which are capable of cleaving a single -stranded nucleic 
acid, such as an mRNA, to which they have a complementary 
region* Thus, ribozymes (e.g., hammerhead ribozymes 
(described in Haselhoff and Gerlach (1988) Nature 

10 334:585-591)) can be used to catalytically cleave iriRNA 
transcripts to thereby inhibit translation of the protein 
encoded by the iHRNil. A ribozyme having specificity for a 
nucleic acid molecule encoding a polypeptide of the 
invention can be designed based upon the nucleotide 

15 sequence of a cDNA disclosed herein. For example, a 
derivative of a Tetrahymena L-19 IVS RNA can be 
constructed in which the nucleotide sequence of the 
active site is complementary to the nucleotide sec[uence 
to be cleaved in a Cech et al* U.S. Patent No. 4,987,071; 

20 and Cech et al. U.S. Patent No. 5,116,742. 

Alternatively, an mRNA encoding a polypeptide of the 
invention can be used to select a catalytic RNA having a 
specific ribonuclease activity from a pool of RNA 
molecules. See, e.g., Bartel cuid Szostak (1993) Science 

25 261:1411-1418. 

The invention also enconpasses nucleic acid molecules 
which form triple helical structures. For example, 
ea^ression of a polypeptide of the invention can be 
inhibited by targeting nucleotide sequences complementary 

30 to the regulatory region of the gene encoding the 

polypeptide (e.g., the promoter and/or enheuicer) to form 
triple helical structures that prevent transcription of 
the gene in target cells. See generally Helene (1991) 
Anticancer Dxng Des. 6 (6) :569-84; Helene (1992) Ann. N.Y. 
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Acad. Sci. 660:27-36; and Naher (1992) Bioasaays 
14(12) :807-15- 

In preferred embodiments, the nucleic acid molecules of 
the invention can be modified at the base moiety, sugar 
5 moiety or phosphate backbone to in^rove, e*g., the 

stability, hybridization, or solubility of the molecule. 
For exanqple, the deoxyribose phosphate backbone of the 
nucleic acids can be modified to generate peptide nucleic 
acids (see Hyrup et al. (1996) Bioorgranic & MedicineO. 

10 Chemistry 4(1): 5-23) « As used herein, the terms 

"peptide nucleic acids" or "FNAs" refer to nucleic acid 
mimics, e.g., DNA mimics, in which the deoxyribose 
phosphate backbone is replaced by a pseudopeptide 
backbone and only the four natural nucleobases are 

15 retained. The neutral backbone of PNAs has been shown to 
allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength* The synthesis of PNA 
oligomers can be performed using standard solid phase 
peptide synthesis protocols as described in Hyrup et al. 

20 (1996), aupra: Perry-O'Keefe et al. (1996) Proc, Natl. 
Acad. Sci. USA 93: 14670-675. 

PNAs can be used in therapeutic and diagnostic 
applications. For example, PNAs can be used as antisense 
or antigene agents for sequence- specific modulation of 

25 gene e^qpression by, e.g., inducing transcription or 

translation arrest or inhibiting replication. PNAs can 
also be used, e.g., in the analysis of single base pair 
mutations in a gene by, e.g., PNA directed PCR clamping; 
as artificial restriction enzymes when used in 

30 combination with other enzymes, e.g., SI nucleases (Hyrup 
(1996) , Bupra; or as probes or primers for DNA sequence 
and hybridization (Hyrup (1996), aupra: Perry-O'Keefe et 
al. (1996) Proc. Natl. Acad. Sci. USA 93: 14670-675). 
In another embodiment, PNAs can be modified, e.g., to 

35 enhance their stability or celltilar uptake, by attaching 
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lipophilic or other helper groups to PNA, by the 
formation o£ PNA-DNA chimeras, or by the use of liposomes 
or other techniques of drug delivery known in the art. 
For exai^ple^ PNA-DNA chimeras can be generated which may 
5 combine the advantageous properties of PNA and DNA. Such 
chimeras allow DNA recognition enzymes, e.g., RNAse H and 
DNA polymerases, to interact with the DNA portion while 
the PNA portion would provide high binding affinity and 
specificity. PNA-DNA chimeras can be linked using 

10 linkers of appropriate lengths selected in terms of base 
stacking, number of bonds between the nucleobases, and 
orientation (Hyrup (1996) , supra) . The synthesis of PNA- 
DNA chimeras can be performed as described in Hyrup 
(1996), supra, cuid Finn et al. (1996) Nucleic Acida Res. 

15 24(17) :3357-63, For example, a DNA chain can be 
synthesized on a solid support using standard 
phosphoramidite coupling chemistry and modified 
nucleoside cuialogs. Compounds such as 5'-(4- 
methoxytri tyl ) amino- 5 ' -deoxy- thymidine phosphoramidite 

20 can be used as a link between the PNA cUid the 5' end of 
DNA (Mag et al. (1989) Nucleic Acids Res. 17:5973-88). 
PNA monomers are then coupled in a stepwise mcumer to 
produce a chimeric molecule with a 5' PNA segment and a 
3' DNA segment (Finn et al. (1996) Nucleic Acids Res. 

25 24 (17) :3357-63) . Alternatively, chimeric molecules can 
be synthesized with a 5' DNA segment and a 3' PNA segment 
(Peterser et al. (1975) Bioorganic Med. Cbem. Lett. 
5;1119-11124) . 

In other embodiments, the oligonucleotide may include 

30 other appended groups such as peptides (e.g., for 
targeting host cell receptors in vivo) , or agents 
facilitating transport across the cell membrane {see, 
e.g., Letsinger et al. (1989) Proc. Natl. Acad. Sci. USA 
86:6553-6556; Lemaitre et al. (1987) Proc. Natl. Acad. 

35 Sci. USA 84:648-652; PCX Publication No. WO 88/09810) or 
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the blood-brain barrier (see, e.g., PCT Publication No. 
WO 89/10134) . In addition, oligonucleotides can be 
modified with hybridization- triggered cleavage agents 
(see, e.g., Krol et al. (1988) Bio/Techniques 6:958-976) 
5 or intercalating agents (see, e.g., Zon (1988) Pharw. 
ReB. 5:539-549), To this end, the oligonucleotide may be 
conjugated to another molecule, e.g., a peptide, 
hybridization triggered cross-linking agent, transport 
agent, hybridization- triggered cleavage agent, etc. 

10 II. Isolated Proteins and Antibodies 

One aspect of the invention pertains to isolated 
proteins, and biologically active portions thereof, as 
well as polypeptide fragments suitable for use as 
imraunogens to raise antibodies directed against a 

15 polypeptide of the invention. In one embodiment, the 
native polypeptide can be isolated from cells or tissue 
sources by an appropriate purification scheme using 
standard protein purification techniques. In another 
embodiment, polypeptides of the invention are produced by 

20 recombinant DNA techniques. Alternative to recombinamt 
esqpression, a polypeptide of the invention can be 
synthesized chemically using standard peptide synthesis 
techniques . 

An "isolated" or "purified" protein or biologically 
25 active portion thereof is substantially free of cellular 
material or other contaminating proteins from the cell or 
tissue source from which the protein is derived, or 
substantially free of chemical precursors or other 
chemicals when chemically synthesized. The language 
30 "substantially free of cellular material" includes 

preparations of protein in which the protein is separated 
from cellular components of the cells from which it is 
isolated or reconibinantly produced* Thus, protein that 
is substantially free of cellular material includes 
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preparations of protein having less than about 30%, 20%, 
10%, or 5% (by dry weight) of heterologous protein (also 
referred to herein as a "contaminating protein") . When 
the protein or biologically active portion thereof is 
5 recombinant ly produced, it is also preferably 

substantially free of culture medium, i.e., culture 
medium represents less than about 20%, 10%, or 5% of the 
volume of the protein preparation. When the protein is 
produced by chemical synthesis, it is preferably 

10 siibstantlally free of chemical precursors or other 

chemicals, i.e., it is separated from chemical precursors 
or other chemicals which are involved in the synthesis of 
the protein. Accordingly such preparations of the 
protein have less than about 30%, 20%, 10%, 5% (by dry 

15 weight) of chemical precursors or compounds other than 
the polypeptide of interest. 

Biologically active portions of a polypeptide of the 
invention include polypeptides comprising amino acid 
sequences sufficiently identical to or derived from the 

20 amino acid sequence of the protein (e.g., the amino acid 

sequence shown in any of SEQ ID Nos: 23-33, 54-63, and 

- which include fewer amino acids than the full length 

protein, and exhibit at least one activity of the 
corresponding full-length protein* Typically, 

25 biologically active portions coirprise a domain or motif 
with at least one activity of the corresponding protein. 
A biologically active portion of a protein of the 
invention can be a polypeptide which is, for example, 10, 
25, 50, 100 or more amino acids in length- Moreover, 

30 other biologically active portions, in which other 

regions of the protein are deleted, can be prepared by 
recombinant techniques and evaluated for one or more of 
the functional activities of the native form of a 
polypeptide of the invention. 
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Preferred polypeptides have the amino acid secjuence of 

any of SEQ ID Nos:23'33, 54-63, and - . Other 

useful proteins are sxibstantially identical (e.g., at 
least about 45%, preferably 55%, 65%, 75%, 85%, 95%, or 

5 99%) to any of SEQ ID Nos:22-33, 54-63, and - and 

retain the fxinctional activity of the protein of the 
corresponding naturally- occurring protein yet differ in 
amino acid sequence due to natural allelic variation or 
mutagenesis. 

10 To determine the percent identity of two amino acid 
sequences or of two nucleic acids, the sequences are 
aligned for optimal comparison purposes (e.g., gaps can 
be introduced in the sequence of a first amino acid or 
nucleic acid sequence for optimal alignment with a second 

15 amino or nucleic acid sequence) . The amino acid residues 
or nucleotides at corresponding amino acid positions or 
nucleotide positions are then compared. When a position 
in the first sequence is occupied by the same amino acid 
residue or nucleotide as the corresponding position in 

20 the second sequence, then the molecules are identical at 
that position. The percent identity between the two 
sequences is a function of the number of identical 
positions shared by the sequences (i.e., % identity « # 
of identical positions/total # of positions (e.g., 

25 overlapping positions) x 100) . Preferably, the two 
sequences are the same length. 

The determination of percent homology between two 
sequences can be accomplished using a mathematical 
algorithm, A preferred, non-limiting example of a 

30 mathematical algorithm utilized for the comparison of two 
sequences is the algorithm of Karlin and Altschul (1990) 
Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in 
Karlin and Altschul (1993) Proc. lyratl. Acad. Sci. USA 
90:5873-5877. Such an algorithm is incorporated into the 

35 NBLAST and XBLAST programs of Altschul, et al. (1990) J. 
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Mol. Biol, 215:403-410. BIAST nucleotide searches can be 
performed with the NBLAST program, score « loo, 
wordlength = 12 to obtain nucleotide sequences homologous 
to a nucleic acid molecules of the invention. BLAST 
5 protein searches can be performed with the XBLAST 

program, score = 50, wordlength 3 to obtain amino acid 
sequences homologous to a protein molecules of the 
invention. To obtain gapped alignments for comparison 
purposes. Gapped BLAST can be utilized as described in 

10 Altschul et al. (1997) Micleic Acids Rcb, 25:3389-3402. 
Alternatively, PSI -Blast Ccui be used to perform eui 
iterated search which detects distant relationships 
between molecules. Xd. When utilizing BLAST, Gapped 
BLAST, and PSI -Blast programs, the default parameters of 

15 the respective programs (e.g., XBLAST and NBLAST) can be 
\ised. See http://www.ncbi, nlm.nih.gov. Another 
preferred, non-limiting example of a mathematical 
algorithm utilized for the con?)arison of sequences is the 
algorithm of Myers and Miller, (1988) CABIOS 4:11-17. 

20 Such an algorithm is incorporated into the ALIGN program 
(version 2.0) which is part of the 6CG sequence alignment 
software package. When utilizing the ALIGN program for 
comparing amino acid sequences, a PAM120 weight residue 
table, a gap length penalty of 12, and a gap penalty of 4 

25 can be used. 

The percent identity between two sequences can be 
determined using techniques similar to those described 
above, with or without allowing gaps. In calculating 
percent identity, only exact matches are counted. 

30 The invention also provides chimeric or fusion 
proteins. As used herein, a "chimeric protein" or 
"fusion protein" comprises all or part (preferably 
biologically active) of a polypeptide of the invention 
qperably linked to a heterologous polypeptide (i.e., a 

35 polypeptide other than the same polypeptide of the 
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invention) . Within the fusion protein, the term 
"operably linked" is intended to indicate that the 
polypeptide of the invention and the heterologous 
polypeptide are fused in- frame to each other. The 
5 heterologous polypeptide can be fused to the N-terminus 
or C- terminus of the polypeptide of the invention. 

One useful fusion protein is a GST fusion protein in 
which the polypeptide of the invention is fused to the C- 
terminus of GST secpiences* Such fusion proteins can 

10 facilitate the purification of a recombinant polypeptide 
of the invention* 

In another embodiment, the fusion protein contains a 
heterologous signal sequence at its N- terminus. For 
example, the native signal sequence of a polypeptide of 

15 the invention can be removed and replaced with a signal 
sequence from another protein. For exanqple, the gp67 
secretory sequence of the baculovirus envelope protein 
can be used as a heterologous signal sequence (Current 
Protocols in Molecular Biology, Ausubel et al., eds., 

20 John Wiley & Sons, 1992) • Other examples of eukaryotic 
heterologous signal sequences include the secretory 
sequences of melittin and human placental alkaline 
phosphatase (Stratagene; La Jolla, California) . In yet 
another example, useful prokaryotic heterologous signal 

25 sequences include the phoA secretory signal (Sambrook et 
al., supra) and the protein A secretory signal (Pharmacia 
Biotech; Piscataway, New Jersey) . 

In yet auiother embodiment, the fusion protein is an 
immunoglobulin fusion protein in which all or part of a 

30 polypeptide of the invention is fused to secpiences 
derived from a member of the immunoglobulin protein 
family. The immunoglobulin fusion proteins of the 
invention can be incorporated into pharmaceutical 
compositions and administered to a subject to inhibit an 

35 interaction between a ligand (soluble or membrane -boiind) 
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and a protein on the surface of a cell (receptor) , to 
thereby suppress signal transduction in vivo. The 
inununoglobulin fusion protein can be used to affect the 
bioavailability of a cognate ligand of a polypeptide of 
5 the invention. Inhibition of ligand/ receptor interaction 
may be useful therapeutically, both for treating 
proliferative and dif f erentiative disorders and for 
modulating (e,g, promoting or inhibiting) cell survival. 
Moreover, the immunoglobulin fusion proteins of the 

10 invention can be used as immunogens to produce antibodies 
directed against a polypeptide of the invention in a 
subject, to purify ligands and in screening assays to 
identify molecules which inhibit the interaction of 
receptors with ligands. 

IS Chimeric and fusion protein of the invention can be 
produced by standard recombinant DNA techniques. In 
another embodiment, the fusion gene can be synthesized by 
conventional techniques including automated DNA 
synthesizers. Alternatively, PGR amplification of gene 

20 fragments can be carried out using anchor primers which 
give rise to complementary overheurigs between two 
consecutive gene fragments which can subsequently be 
axmealed and reamplif ied to generate a chimeric gene 
sequence (see, e.g., Ausubel et al., supra). Moreover, 

25 many escpression vectors are commercially available that 
already encode a fusion moiety (e.g., a GST polypeptide) . 
A nucleic acid encoding a polypeptide of the invention 
can be cloned into such an e3q)ression vector such that 
the fusion moiety is linked in-frame to the polypeptide 

30 of the invention. 

A signal sequence of a polypeptide of the invention 
(SEQ ID NOs: 64-75) can be used to facilitate secretion 
and isolation of the secreted protein or other proteins 
of interest. Signal sequences are typically 

35 characterized by a core of hydrophobic amino acids which 
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are generally cleaved from the mature protein during 
secretion in one or more cleavage events. Such signal 
peptides contain processing sites that allow cleavage of 
the signal sequence from the mature proteins as they pass 
5 through the secretory pathway. Thus, the invention 
pertains to the described polypeptides having a signal 
sequence, as well as to the signal sequence itself and to 
the polypeptide in the absence of the signal sequence 
(i.e.^ the cleavage products), in one embodiment, a 

10 nucleic acid sequence encoding a signal sequence of the 
invention can be operably linked in an esqpression vector 
to a protein of interest, such as a protein which is 
ordinarily not secreted or is otherwise difficult to 
isolate. The signal sequence directs secretion of the 

15 protein, such as fsrom a eukaryotic host into which the 
expression vector is transformed, and the signal sequence 
is subsequently or concurrently cleaved. The protein can 
then be readily purified from the extracellular medium by 
art recognized methods. Alternatively, the signal 

20 sequence can be linked to the protein of interest using a 
sequence which facilitates purification, such as with a 
GST domain. 

In another embodiment, the signal sequences of the 
present invention can be used to identify regulatory 

25 sequences, e.g., promoters, enhancers, repressors. Since 
signal secjuences are the most amino- terminal sequences of 
a peptide, it is e3qc>ected that the nucleic acids which 
flank the signal sequence on its amino- terminal side will 
be regulatory sequences which affect transcription. 

30 Thus, a nucleotide sequence which encodes all or a 

portion of a signal sequence can be used as a probe to 
identify and isolate signal sequences and their flanking 
regions, and these flanking regions can be studied to 
identify regulatory elements therein. 
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The present invention also pertains to variants of the 
polypeptides of the invention. Such variants have an 
altered amino acid sequence which can function as either 
agonists (mimetics) or as antagonists. Variants can be 
5 generated by mutagenesis, e.g., discrete point mutation 
or truncation. An agonist can retain substantially the 
same, or a subset, of the biological activities of the 
naturally occurring form of the protein. An antagonist 
of a protein can inhibit one or more of the activities of 

10 the naturally occurring form of the protein by, for 
example I competitively binding to a downstream or 
upstream member of a cellular signaling cascade which 
includes the protein of interest. Thus, specific 
biological effects can be elicited by treatment with a 

IS variant of limited function. Treatment of a subject with 
a variant having a subset of the biological activities of 
the naturally occurring form of the protein can have 
fe^er side effects in a subject relative to treatment 
with the naturally occurring form of the protein. 

20 Variants of a protein of the invention which function 
as either agonists (mimetics) or as antagonists can be 
identified by screening combinatorial libreuries of 
mutants, e.g., truncation mutants, of the protein of the 
invention for agonist or antagonist activity. In one 

25 embodiment, a variegated library of variants is generated 
by combinatorial mutagenesis at the nucleic acid level 
and is encoded by a variegated gene librsury. A 
variegated library of variants can be produced by, for 
exanple, enzymatically ligating a mixture of synthetic 

30 oligonucleotides into gene sequences such that a 
degenerate set of potential protein sequences is 
esqpressible as individual polypeptides, or alternatively, 
as a set of larger fusion proteins (e.g., for phage 
display) . There are a variety of methods which can be 

35 used to produce libraries of potential variants of the 
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polypeptides of the invention from a degenerate 
oligonucleotide sequence. Methods for synthesizing 
degenerate oligonucleotides are known in the art (see, 
e.g., Narang (1983) Tetrahedron 39:3; Itakura et al. 
5 (1984) Annu, Rev. Biochem. 53:323; ItaJcura et al. (1984) 
Science 198:1056; Ike et al. (1983) Nucleic Acid Rea. 
11:477) • 

In addition, libraries of fragments of the coding 
sequence of a polypeptide of the invention can be used to 

10 generate a variegated population of polypeptides for 
screening and subsequent selection of varieoits. For 
exanqple, a library of coding sequence fragments can be 
generated by treating a double stranded PGR fragment of 
the coding sequence of interest with a nuclease under 

IS conditions wherein nicking occure only about once per 
molecule, denaturing the double stranded DNA, renaturing 
the DNA to form double stranded DNA which can include 
sense/ant isense pairs from different nicked products, 
removing single stranded portions from reformed duplexes 

20 by treatment with SI nuclease, and ligating the resulting 
fragment library into an e3q)ression vector. By this 
method, an expression libraxy can be derived which 
encodes N-terminal and internal fragments of various 
sizes of the protein of interest. 

25 Several techniques are known in the art for screening 
gene products of combinatorial libraries made by point 
mutations or truncation, and for screening cDNA libraries 
for gene products having a selected property. The most 
widely used techniques, which are amenable to high 

30 through-put analysis, for screening large gene libraries 
typically include cloning the gene library into 
replicable expression vectors, transforming appropriate 
cells with the resulting library of vectors, and 
expressing the combinatorial genes under conditions in 

35 which detection of a desired activity facilitates 
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isolation of the vector encoding the gene whose product 
was detected. Recursive ensemble mutagenesis (REM) , a 
technique which enhances the frequency of fiinctional 
mutcuits in the libraries, can be used in combination with 
5 the screening assays to identify variants of a protein of 
the invention (Arkin and Yourvan (1992) Proc. Natl. Acad. 
Sex. USA 89:7811-7815; Delgrave et al. (1993) Protein 
Engineering 6(3) :327-331) . 

An isolated polypeptide of the invention, or a fragment 

10 thereof, cctn be used as an immunogen to generate 

antibodies using standard techniques for polyclonal and 
monoclonal antibody preparation. The full-length 
polypeptide or protein can be used or, alternatively, the 
invention provides antigenic peptide fragments for use as 

15 immunogens. The smtigenic peptide of a protein of the 
invention contprises at least 8 (preferably 10, 15, 20, or 
30) amino acid residues of the amino acid sequence shown 

in any of SEQ ID No3:23-33, 54-64, and - and 

encompasses an epitope of the protein such that an 

20 antibody raised against the peptide foinnas a specific 
immune complex with the protein. 

Preferred epitopes encompassed by the antigenic peptide 
are regions that are located on the surface of the 
protein, e.g., hydrophilic regions, rather than 

25 hydrophobic regions, e.g*, transmembrane domains. The 
hydrophilicity of a protein sequence can be easily 
determined using readily available programs. 

An immunogen typically is used to prepare antibodies by 
immunizing a suitable subject, (e.g., rabbit, goat, mouse 

30 or other mammal) • An appropriate immunogenic preparation 
can contain, for example, recombinant ly expressed 
chemically synthesized polypeptide. The preparation can 
further include an adjuvant, such as Fretind's conplete or 
incomplete adjuvant, or similar immunostimulatory agent. 
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Accordingly, another aspect of the invention pertains 
to antibodies directed against a polypeptide of the 
invention. The term "antibody" as used herein refers to 
inmiunoglobulin molecules and immunologically active 
5 portions of immunoglobulin molecules, i.e., molecules 
that contain an antigen binding site which specifically 
binds an antigen, such as a polypeptide of the invention. 
A molecule which specifically binds to a given 
polypeptide of the invention is a molecule which binds 

10 the polypeptide, but does not svibstantially bind other 
molecules in a sample, e.g., a biological sample, which 
naturally contains the polypeptide. Examples of 
immunologically active portions of immunoglobulin 
molecules include F(ab) and F(ab')2 fragments which can be 

15 generated by treating the antibody with an enzyme such as 
pepsin. The invention provides polyclonal and monoclonal 
antibodies. The term "monoclonal antibody" or 
"monoclonal antibody composition", as used herein, refers 
to a population of antibody molecules that contain only 

20 one species of an antigen binding site capable of 
imrounoreacting with a particular epitope. 

Polyclonal antibodies can be prepared as described 
above by immunizing a suitable subject with a polypeptide 
of the invention as an immunogen. The antibody titer in 

25 the immunized subject can be monitored over time by 
standard technicpies, such as with an enzyme linked 
immunosorbent assay (ELISA) using immobilized 
polypeptide. If desired, the antibody molecules can be 
isolated from the mammal (e.g., from the blood) and 

30 further purified by well-known techniques, such as 

protein A chromatography to obtain the IgG fraction. At 
an appropriate time after immunization, e.g., when the 
specific antibody titers are highest, antibody-producing 
cells can be obtained from the subject and used to 

35 prepare monoclonal antibodies by standard techniques. 
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such as the hybridoma technique originally described by 
Kbhler and Nilstein (1975) Nature 256:495-497, the human 
B cell hybridoma technique (Kozbor et al. (1983) Iimunol. 
Today 4:72), the EBV-hybridoma technique (Cole et al. 
5 (1985), Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc., pp- 77-96) or trioma techniques. The 
technology for producing hybridomas is well known (see 
generally Current Protocols in Irmunology (1994) Coligan 
et al. (eds.) John Wiley fie Sons, Inc., New York, NY). 

10 Hybridoma cells producing a monoclonal antibody of the 
invention are detected by screening the hybridoma culture 
supematants for antibodies that bind the polypeptide of 
interest, e.g., using a standard ELISA assay. 

Alternative to preparing monoclonal antibody- secreting 

15 hybridomas, a monoclonal antibody directed against a 
polypeptide of the invention can be identified and 
isolated by screening a recombinant combinatorial 
immunoglobulin library (e.g., an antibody phage display 
library) with the polypeptide of interest. Kits for 

20 generating and screening phage display libraries are 
commercially available (e.g., the Pharmacia Recombinant 
Phage Antibody System, Catalog No. 27-9400-01; and the 
Stratagene SurfZAP^ Phage Display Kit, Catalog No. 
240612) . Additionally, examples of methods and reagents 

25 particularly amenable for use in generating and screening 
antibody display library can be found in, for exanple, 
U.S. Patent No. 5,223,409; PCT Publication No. WO 
92/18619; PCT Publication No. WO 91/17271; PCT 
Publication No. WO 92/20791; PCT Publication No. WO 

30 92/15679; PCT Publication No. WO 93/01288; PCT 

Publication No. WO 92/01047; PCT Publication No. WO 
92/09690; PCT Publication No. WO 90/02809; Puchs et al. 
(1991) Bio/Technology 9:1370-1372; Hay et al. (1992) Hum. 
Antibod. Hybridomas 3:81-85; Huse et al. (1989) Science 
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246:1275-1281; Griffiths et al. (1993) EMBO J. 12:725- 
734. 

Additionally, recombinant antibodies , such as chimeric 
and humanized monoclonal antibodies, comprising both 
5 human and non-human portions, which can be made using 
standard recombinant DNA techniques, are within the scope 
of the invention* Such chimeric and humanized monoclonal 
antibodies can be produced by recombinant DNA techniques 
known in the art, for example using methods described in 

10 PCT Publication No. WO 87/02671; European Patent 

Application 184,187; Etiropean Patent Application 171,496; 
European Patent Application 173,494; PCT Publication No. 
wo 86/01533; U.S. Patent No. 4,816,567; European Patent 
Application 125,023; Better et al. (1988) Science 

15 240:1041-1043; Liu et al. (1987) Proa. Natl. Acad. Sci. 
USA 84:3439*3443; Liu et al. (1987) J. Immanol. 
139:3521-3526; Sun et al . (1987) Proc. Natl, Acad, Sci. 
USA 84:214-218; Nishimura et al, (1987) Cane. jRes. 
47:999-1005; Wood et al. (1985) Nature 314:446-449; and 

20 Shaw et al. (1988) J. Natl. Cancer Inst. 80:1553-1559); 
Morrison (1985) Science 229:1202-1207; Oi et al. (1986) 
fiio/Technigues 4:214; U.S. Patent 5,225,539; Jones et al. 
(1986) Nature 321:552-525; Verhoeyan et al. (1988) 
Science 239:1534; and Beidler et al. (1988) J. lammol. 

25 141:4053-4060. 

Completely human antibodies are particularly desirable 
for therapeutic treatment of human patients. Such 
antibodies can be produced using transgenic mice which 
are incapable of expressing endogenous immunoglobulin 

30 heavy and light chains genes, but which can e^^ress human 
heavy and light chain genes. The transgenic mice are 
immunized in the normal fashion with a selected antigen, 
e.g., all or a portion of a polypeptide of the invention. 
Monoclonal antibodies directed against the antigen can be 

35 obtained using conventional hybridoma technology. The 
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human immunoglobulin transgenes harbored by the 
transgenic mice rearrange during B cell differentiation, 
and subsequently undergo class switching and somatic 
mutation. Thus, using such a technique, it is possible 
5 to produce therapeutically useful IgG, IgA and IgE 
antibodies. For an overview of this technology for 
producing human antibodies, see Lonberg and Huszar (1995, 
Int. Rev. Immunol. 13:65-93). For a detailed discussion 
of this technology for producing human antibodies and 

10 human monoclonal antibodies and protocols for producing 
such antibodies, see, e.g., U.S. Patent 5,625,126; U.S. 
Patent 5,633,425; U.S. Patent 5,569,825; U.S. Patent 
5,661,016; and U.S. Patent 5,545,806. In addition, 
companies such as Abgenix, Inc. (Freemont, CA) , can be 

15 engaged to provide hxunan antibodies directed against a 
selected antigen using technology similar to that 
described above. 

Completely human antibodies which recognize a selected 
epitope can be generated using a technique referred to as 

20 "guided selection." In this approach a selected 

non-human monoclonal antibody, e.g., a murine antibody, 
is used to guide the selection of a completely human 
antibody recognizing the same epitope. 

An antibody directed against a polypeptide of the 

25 invention (e.g., monoclonal antibody) can be used to 
isolate the polypeptide by standard techniques, such as 
affinity chromatography or immunoprecipitation. 
Moreover, such an antibody can be used to detect the 
protein (e.g., in a cellular lysate or cell supernatant) 

30 in order to evaluate the abundance and pattern of 

expression of the polypeptide. The antibodies can also 
be used diagnostically to monitor protein levels in 
tissue as part of a clinical testing procedure, e.g., to, 
for example, determine the efficacy of a given treatment 

35 regimen. Detection can be facilitated by coupling the 
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antibody to a detectable substance. Examples o£ 
detectable siibstances include various enzymes, prosthetic 
groups, fluorescent materials, luminescent materials, 
bioluminescent materials, and radioactive materials. 
5 Examples of suitable enzymes include horseradish 

peroxidase, alkaline phosphatase, i3-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic 
group complexes include streptavidin/biotin and 
avidin/biotin; exanples of suitable fluorescent materials 

10 include utnbelliferone, fluorescein, fluorescein 
Isothiocyanate, rhodamine, dichlorotriazinylamine 
fluorescein, dansyl chloride or phycoerythrin; an example 
of a luminescent material includes luminol; examples of 
bioluminescent materials include lucif erase, luciferin, 

15 and aeguorin, and exan^les of Buit€d>le radioactive 
material include "^I, "^I, ^^S or ^H. 

III. Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, 
preferably expression vectors, containing a nucleic acid 

20 encoding a polypeptide of the invention (or a portion 
thereof) . As used herein, the term **vector" refers to a 
nucleic acid molecule capable of transporting ainother 
nucleic acid to which it has been linked- One type of 
vector is a "plasmid" , which refers to a circular double 

25 stranded DNA loop into which additional DNA segments can 
be ligated- Another type of vector is a viral vector, 
wherein additional DNA segments can be ligated into the 
viral genome. Certain vectors are capable of autonomous 
replication in a host cell into which they are introduced 

30 (e.g., bacterial vectors having a bacterial origin of 
replication and episomal mammalian vectors) . Other 
vectors (e.g., non-episomal mammalian vectors) are 
integrated into the genome of a host cell upon 
introduction into the host cell, and thereby are 
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replicated along with the host genome. Moreover, certain 
vectors, eaqpression vectors, are capable of directing the 
esqpression of genes to which they are operably linked* 
In general, egression vectors of utility in recombinant 
5 DNA techniques are often in the form of plasmids 

(vectors) . However, the invention is intended to include 
such other forms of expression vectors, such as viral 
vectors (e,g., replication defective retroviruses, 
adenoviruses and adeno-associated viruses) , which serve 

10 equivalent functions. 

The recombinant expression vectors of the invention 
comprise a nucleic acid of the invention in a form 
suitable for expression of the nucleic acid in a host 
cell. This means that the recombinant expression vectors 

15 include one or more regulatory sequences , selected on the 
basis of the host cells to be used for expression, which 
is operably linked to the nucleic acid sequence to be 
expressed. Within a recombinant expression vector, 
"operably linked" is intended to mean that the nucleotide 

20 sec[uence of interest is linked to the regulatory 

sequence (s) in a manner which allows for expression of 
the nucleotide sequence (e.g., in an in vitro 
transcription/translation system or in a host cell when 
the vector is introduced into the host cell) . The term 

25 "regulatory sequence" iis intended to include promoters, 
enhancers and other expression control elements (e.g., 
polyadenylation signals) . Such regulatory secjuences are 
described, for example, in Goeddel, Gene Expression 
Technology: MethodB in Bnzymology 185, Academic Press, 

30 San Diego, CA (1990) . Regulatory sequences include those 
which direct constitutive expression of a nucleotide 
sequence in many types of host cell and those which 
direct expression of the nucleotide sequence only in 
certain host cells (e.g., tissue-specific regulatory 

35 sequences) . It will be appreciated by those skilled in 
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the art that the design of the expression vector can 
depend on such factors as the choice of the host cell to 
be transformed, the level of expression of protein 
desired, etc. The expression vectors of the invention 
5 can be introduced into host cells to thereby produce 
proteins or peptides, including fusion proteins or 
peptides, encoded by nucleic acids as described herein. 

The recombinant expression vectors of the invention can 
be designed for eaqpression of a polypeptide of the 

10 invention in prokaryotic or eukaryotic cells, e.g,, 
bacterial cells such as E. coli, insect cells (using 
baculovixnis expression vectors) « yeast cells or mammalian 
cells. Suitable host cells are discussed further in 
Goeddel, supra. Alternatively, the recombinant 

15 expression vector can be transcribed and translated in 
vitro, for example using T7 promoter regulatory sequences 
and T7 polymerase. 

Expression of proteins in prokaryotes is most often 
carried out in E. coli with vectors containing 

20 constitutive or inducible promoters directing the 
e3q>res8ion of either fusion or non- fusion proteins. 
Fusion vectors add a number of amino acids to a protein 
encoded therein, usually to the amino terminus of the 
recombinant protein. Such fusion vectors typically serve 

25 three purposes: 1) to increase expression of recombinemt 
protein; 2) to increase the solubility of the recombinant 
protein; and 3) to aid in the purification of the 
recombinant protein by acting as a ligand in affinity 
purification. Often, in fusion expression vectors, a 

30 proteolytic cleavage site is introduced at the junction 
of the fusion moiety and the recombinant protein to 
enable separation of the recombinant protein from the 
fusion moiety siibsequent to purification of the fusion 
protein. Such enzymes, and their cognate recognition 

35 sequences, include Factor Xa, thrombin and enterokinase . 
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Typical fusion esqpression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson (1988) Gene 67:31-40), 
pMAL (New England Biolabs, Beverly, MA) cind pRITB 
(Pharmacia, Piscataway, NJ) which fuse glutathione S- 
5 transferase (GST) , maltose E binding protein, or protein 
A, respectively, to the target recombinant protein. 
Examples of suitable inducible non- fusion E. coli 
expression vectors include pTrc (Amann et al«, (1988) 
Gene 69:301-315) and pET lid (Studier et al*. Gene 

10 Ejqpreaaion Technology: Methods in Bnzymology IBS, 
Academic Press, San Diego, California (1990) 60-89) . 
Target gene expression from the pTrc vector relies on 
host RNA polymerase transcription from a hybrid trp-lac 
fusion promoter. Target gene expression from the pET lid 

15 vector relies on trauiscription from a T7 gnlO-lac fusion 
promoter mediated by a coexpressed viral RNA polymerase 
(T7 gnl) . This viral polymerase is supplied by host 
strains BL21(DE3) or HMS174(DE3) from a resident X 
prophage harboring a T7 gnl gene under the 

20 trcinscriptional control of the lacUV 5 promoter. 

One strategy to maximize recombinant protein expression 
in E. coli is to express the protein in a host bacteria 
with an inpaired capacity to proteolytically cleave the 
recombinant protein (Gottesman, Gene Expression 

25 Technology: Methods in Bnzymology 185, Academic Press, 
San Diego, California (1990) 119-128) , Another strategy 
is to alter the nucleic acid sequence of the nucleic acid 
to be inserted into an expression vector so that the 
individual codons for each amino acid are those 

30 preferentially utilized in E. coli (Wada et al. (1992) 
Nucleic Acids Res. 20:2111-2118) . Such alteration of 
nucleic acid sequences of the invention Ccuni be carried 
out by standard DNA synthesis techniques. 

In another embodiment, the expression vector is a yeast 

35 esqpression vector. Examples of vectors for e3q>ression in 
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yeast S, cerivisae include pYepSecl (Baldari et al. 
(1987) EUBO J. 6:229-234), pMFa (Kurjan and Herskowitz, 
(1982) Cell 30:933-943), pJRY88 (Schultz et al. (1987) 
Gene 54 : 113-123) , pYES2 (Invitrogen Corporation, San 
5 Diego, CA) , and pPicZ (Invitrogen Corp, San Diego, CA) . 
Alternatively, the expression vector is a baculovirus 
expression vector. Baculovirus vectors available for 
expression of proteins in cultured insect cells (e,g., Sf 
9 cells) include the pAc series (Smith et al* (1983) Mol. 

10 Cell Biol. 3:2156-2165) and the pVL series (Lucklow and 
Sumners (1989) Virology 170:31-39) . 

In yet another embodiment, a nucleic acid of the 
invention is expressed in mammalian cells using a 
mammalian expression vector. Exanples of mammalian 

15 expression vectors include pCDMS (Seed (1987) Nature 

329:840) and pMT2PC (Kaufman et al, (1987) EMBO J. 6:187- 
195) . When used in mammalian cells, the expression 
vector's control functions are often provided by viral 
regulatory elements. For example, commonly used 

20 promoters are derived from polyoma, Adenovirus 2, 

cytomegalovirus and Simian Virus 40. For other suitable 
expression systems for both prokaryotic and eulcaryotic 
cells see chapters 16 and 17 of Sanibrook et al«, aupra. 
In another embodiment, the recombinant mammalian 

25 expression vector is capable of directing expression of 
the nucleic acid preferentially in a particular cell type 
(e.g., tissue-specific regulatory elements are used to 
express the nucleic acid) . Tissue- specif ic regulatory- 
elements are known in the art. Non- limiting examples of 

30 suitable tissue -specific promoters include the albumin 
promoter (liver-specif ic; Pinkert et al. (1987) Genes 
Dev. 1:268-277), lymphoid-specif ic promoters (Calame and 
Baton (1988) Adv* Immmol. 43:235-275), in particular 
promoters of T cell receptors (Winoto and Baltimore 

35 (1989) EMBO J. 8:729-733) and immunoglobulins (Banerji et 
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al. (1983) Cell 33:729-740; Queen and Baltimore (1983) 
Cell 33:741-748) I neuron- specif ic promoters (e.g., the 
neurofilament promoter; Byrne and Ruddle (1989) Proc. 
Natl. Acad. Sci. DBA 86:5473-5477), pancreas -specific 
5 promoters (Edlund et al. (1985) Science 230:912-916), and 
mammary gland-specific promoters (e.g., milk whey 
promoter; U.S. Patent No. 4,873,316 and European 
Application Publication No. 264,166). Developmentally- 
regulated promoters are also encompassed, for example the 

10 murine hox promoters (Kessel and Gruss (1990) Science 
249:374-379) and the Of- fetoprotein promoter (Campes and 
Tilghmcua (1989) Genes Dev. 3:537-546). 

The invention further provides a recombinant expression 
vector comprising a DNA molecule of the invention cloned 

15 into the expression vector in an antisense orientation. 
That is, the WA molecule is operably linked to a 
regulatory sequence in a manner which allows for 
expression (by transcription of the DNA molecule) of an 
RNA molecule which is antisense to the mRNA encoding a 

20 polypeptide of the invention. Regulatory sequences 

operably linked to a nucleic acid cloned in the antisense 
orientation can be chosen which direct the continuous 
e:q>res8ion of the antisense RMA molecule in a variety of 
cell types, for instance viral promoters and/or 

25 enhancers, or regulatory sequences can be chosen which 
direct constitutive, tissue specific or cell type 
specific expression of antisense RNA. The antisense 
eaqpression vector Cctn be in the form of a recombinant 
plasmid, phagemid or attenuated virus in which antisense 

30 nucleic acids are produced under the control of a high 
efficiency regulatory region, the activity of which can 
be determined by the cell type into which the vector is 
introduced. For a discussion of the regulation of gene 
expression using antisense genes see Weintraub et al. 

35 (leeviews - Trends in Genetics, Vol. 1(1) 1986). 
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Another aspect of the invention pertains to host cells 
into which a recombinant expression vector o£ the 
invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. 
5 It is understood that such terms refer not only to the 
particular subject cell but to the progeny or potential 
progeny of such a cell. Because certain irodif ications 
may occur in succeeding generations due to either 
mutation or environmental influences, such progeny may 
10 not, in fact, be identical to the parent cell, but are 
still included within the scope of the term as used 
herein. 

A host cell can be any prokaryotic (e.g., E. coli) or 
euUcaryotic (e.g., an insect cell, a yeast cell or a 

15 mammalian cell) cell. 

Vector DNA can be introduced into prokaryotic or 
eukaryotic cells via conventional transformation or 
transfection techniques. As used herein, the terms 
"transformation" and "transfection" are intended to refer 

20 to a variety of art -recognized techniques for introducing 
foreign nucleic acid into a host cell, including calcium 
phosphate or calcium chloride co-precipitation, DEAB- 
dextran*mediated transfection, lipofection, or 
electroporation. Suitable methods for transforming or 

25 transfecting host cells can be foiind in Sambrook, et al. 
(supra), and other laboratory manuals. 

For stable transfection of mammalian cells, it is known 
that, depending upon the expression vector and 
transfection technique used, only a small fraction of 

30 cells may integrate the foreign DNA into their genome. 
In order to identify and select these integrants, a gene 
that encodes a selectable marker (e.g., for resistance to 
antibiotics) is generally introduced into the host cells 
along with the gene of interest. Prefezxed selectable 

35 markers include those which confer resistance to drugs. 
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such as G418, hygromycin and methotrexate. Cells stably 
transfected with the introduced nucleic acid can be 
identified by drug selection (e.g., cells that have 
incorporated the selectable marker gene will survive, 
5 while the other cells die) , 

A host cell of the invention^ such as a prokaryotic or 
eukaryotic host cell in culture, can be used to produce a 
polypeptide of the invention- Accordingly, the invention 
fiirther provides methods for producing a polypeptide of 

10 the invention using the host cells of the invention. In 
one embodiment, the method comprises culturing the host 
cell of invention (into which a recombinant expression 
vector encoding a polypeptide of the invention has been 
introduced) in a suitable medium such that the 

15 polypeptide is produced. In another embodiment, the 
method further comprises isolating the polypeptide from 
the medium or the host cell. 

The host cells of the invention can also be used to 
produce nonhuman transgenic animals. For exattple, in one 

20 embodiment, a host cell of the invention is a fertilized 
oocyte or an embryonic stem cell into which a sequences 
encoding a polypeptide of the invention have been 
introduced. Such host cells can then be used to create 
non-human transgenic animals in which exogenous sequences 

25 encoding a polypeptide of the invention have been 

introduced into their genome or homologous recombinant 
animals in which endogenous encoding a polypeptide of the 
invention sequences have been altered. Such animals are 
useful for studying the function and/or activity of the 

30 polypeptide and for identifying and/or evaluating 

modulators of polypeptide activity. As used herein, a 
"transgenic animal" is a non-human animal, preferably a 
mammal, more preferably a rodent such as a rat or mouse, 
in which one or more of the cells of the animal includes 

35 a transgene. Other examples of transgenic animals 
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include non-human primates, sheep, dogs, cows, goats, 
chickens, amphibians, etc. A transgene is exogenous DNA 
which is integrated into the genome of a cell from which 
a transgenic animal develops and which remains in the 
5 genome of the mature animal, thereby directing the 

expression of an encoded gene product in one or more cell 
types or tissues of the transgenic animal. As used 
herein, an "homologous recombinant animal" is a non-human 
animal, preferably a mammal, more preferably a moiise, in 

10 which an endogenous gene has been altered by homologous 
recombination between the endogenous gene and an 
exogenous DNA molecule introduced into a cell of the 
animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 

15 A transgenic emimal of the invention can be created by 
introducing nucleic acid encoding a polypeptide of the 
invention (or a homologue thereof) into the male 
pronuclei of a fertilized oocyte, e.g., by 
microinjection, retroviral infection, and allowing the 

20 oocyte to develop in a pseudopregnant female foster 

animal. Intronic sequences and polyadenylation signals 
can also be included in the treuisgene to increase the 
efficiency of egression of the transgene. A tissue- 
specific regulatory sequence (s) can be operably linked to 

25 the transgene to direct expression of the polypeptide of 
the invention to particular cells* Methods for 
generating transgenic animals via embryo manipulation and 
microinjection, particularly animals such as mice, have 
become conventional in the art and are described, for 

30 example, in U.S. Patent NOS. 4,736,866 and 4,870,009, 
U.S. Patent No. 4,873,191 and in Hogan, Manipulating the 
Mouse Enbryo, (Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, N.Y., 1986). Similar methods are used for 
production of other transgenic animals. A transgenic 

35 founder animal can be identified based upon the presence 
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of the transgene in its genome and/ or expression of niRNA 
encoding the transgene in tissues or cells of the 
animals. A transgenic foiinder animal can then be used to 
breed additional animals carrying the transgene. 
5 Moreover, trcinsgenic animals carrying the transgene can 
further be bred to other transgenic animals carrying 
other transgenes. 

To create an homologous recombinant animal, a vector is 
prepared which contains at least a portion of a gene 

10 encoding a polypeptide of the invention into which a 

deletion, addition or substitution has been introduced to 
thereby alter, e.g,, functionally disrupt, the gene. In 
a preferred embodiment, the vector is designed such that, 
upon homologous recombination, the endogenous gene is 

15 functionally disrupted (i.e., no longer encodes a 

functional protein; also referred to as a "knock out" 
vector) . Alternatively, the vector can be designed such 
that, upon homologous recombination, the endogenous gene 
is mutated or otherwise altered but still encodes 

20 functional protein (e.g., the upstream regulatory region 
can be altered to thereby alter the expression of the 
endogenous protein) . In the homologous recombination 
vector, the altered portion of the gene is flanked at its 
5' and 3' ends by additional nucleic acid of the gene to 

25 allow for homologous recombination to occur between the 
exogenous gene carried by the vector and an endogenous 
gene in an embryonic stem cell . The additional flanking 
nucleic acid sequences are of sufficient length for 
successful homologous recombination with the endogenous 

30 gene. Typically, several kilobases of flanking DNA (both 
at the 5' and 3' ends) are included in the vector (see, 
e.g., Thomas and Capecchi (1987) Cell 51:503 for a 
description of homologous recombination vectors) . The 
vector is introduced into an embryonic stem cell line 

35 (e*g., by electroporation) and cells in which the 
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introduced gene has homologously recombined with the 
endogenous gene are selected {see, e.g., Li et al. (1992) 
Cell 69:915). The selected cells are then injected into 
a blastocyst of an animal (e.g., a mouse) to form 
5 aggregation chimeras (see, e.g., Bradley in 

Teratocarcinomas and Embryonic Stem Cells: A Practical 
Approach, Robertson, ed. (IRL, Oxford, 1987) pp^ 113- 
152) . A chimeric embryo can then be implanted into a 
suitable pseudopregnant female foster animal and the 

10 embryo brought to term. Progeny harboring the 

homologously recombined DNA in their germ cells can be 
used to breed animals in which all cells of the animal 
contain the homologously recombined DNA by germline 
transmission of the transgene. Methods for constructing 

15 homologous recombination vectors and homologous 

recombinant animals are described further in Bradley 
(1991) Current Opinion in Bio/Technology 2:823-829 and in 
PCT Publication NOS. WO 90/11354, WO 91/01140, WO 
92/0968, and WO 93/04169. 

20 In another embodiment, transgenic non-human animals can 
be produced which contain selected systems which allow 
for regulated expression of the tremsgene. One example 
of such a system is the cre/loxP recomblnase system of 
bacteriophage PI. For a description of the cre/loxP 

25 recombinase system, see, e.g., LeUcso et al. (1992) Proc. 
Katl. Acad. Sci. USA 89:6232-6236. Another example of a 
recombinase system is the PLP recombinase system of 
Saccheuromyces cerevisiae (O' Gorman et al. (1991) Science 
251:1351-1355. If a cre/loxP recombinase system is used 

30 to regulate expression of the transgene, animals 

containing transgenes encoding both the Cre recombinase 
and a selected protein are required. Such animals can be 
provided through the construction of "double" transgenic 
euiimals, e.g., by mating two transgenic animals, one 
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containing a transgene encoding a selected protein and 
the other containing a transgene encoding a recombinase. 

Clones of the non-human transgenic animals described 
herein can also be produced according to the methods 
5 described in Wilmut et al. (1997) Nature 385:810-813 and 
PCT Publication NOS. WO 97/07668 and WO 97/07669, 

IV. Pharmaceutical Compositions 

The nucleic acid molecules i polypeptides, and 
antibodies (also referred to herein as ^'active 

10 compounds") of the invention can be incorporated into 
pharmaceutical compositions suitable for administration. 
Such Gonpositions typically comprise the nucleic acid 
molecule, protein, or antibody and a pharmaceutical ly 
acceptable carrier. As used herein the language 

15 "pharmaceutically acceptable carrier" is intended to 

include any cuid all solvents, dispersion media, coatings, 
antibacterial and antifungal agents, isotonic and 
absorption delaying agents, and the like, compatible with 
pharmaceutical administration. The use of such media and 

20 agents for pharmaceutlcally active substances is well 
known in the art. Except insofar as any conventional 
media or agent is incompatible with the active conpoxind, 
use thereof in the conqpositions is contemplated. 
Supplementary active compounds can also be incorporated 

25 into the compositions. 

The invention includes methods for preparing 
pharmaceutical compositions for modulating the expression 
or activity of a polypeptide or nucleic acid of the 
invention. Such methods comprise formulating a 

30 pharmaceutlcally acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention. Such compositions can 
further include additionl active agents. Thus, the 
invention further includes methods for preparing a 
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pharmaceutical composition by formulating a 
pharmaceutically acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention and one or more addtional 
5 active compounds. 

A phaarmaceutical composition of the invention is 
formulated to be compatible with its intended route of 
administration. Examples of routes of administration 
include parenteral, e.g., intravenous, intradermal, 

10 subcutaneous, oral (e.g., inhalation), transdermal 
(topical), transmucosal , and rectal administration. 
Solutions or suspensions used for parenteral, 
intradermal, or subcutaneoiis application can include the 
following components: a sterile diluent such as water for 

IS injection, saline solution, fixed oils, polyethylene 
glycols, glycerine, propylene glycol or other synthetic 
solvents; antibacterial agents such as benzyl alcohol or 
methyl parabens; antioxidants such as ascorbic acid or 
sodium bisulfite; chelating agents such as 

20 ethylenediaminetetraacetic acid; buffers such as 
acetates, citrates or phosphates and agents for the 
adjustment of tonicity such as sodium chloride or 
dextrose. pH can be adjusted with acids or bases, such 
as hydrochloric acid or sodium hydroxide.. The parenteral 

25 preparation can be enclosed in ampoules, disposable 

syringes or multiple dose vials made of glass or plastic. 

Pharmaceutical coirpositions suitable for injectable use 
include sterile aqueous solutions (where water soluble) 
or dispersions and sterile powders for the extemporaneous 
30 preparation of sterile injectable solutions or 

dispersions. For intravenous administration, suitable 
carriers include physiological saline, bacteriostatic 
water, Cremophor BL~ (BASF; Parsippany, NJ) or phosphate 
buffered saline (PBS) . In all cases, the composition 
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must be sterile and should be fluid to the extent that 
easy syringability exists. It must be stable under the 
conditions of manufacture and storage and must be 
preserved against the contaminating action of 
5 microorganisms such as bacteria and fungi. The carrier 
can be a solvent or dispersion medium containing, for 
example, water, ethanol, polyol (for example, glycerol, 
propylene glycol, and liquid polyetheylene glycol, and 
the like) , and suitable mixtxires thereof . The proper 

10 fluidity can be maintained, for example, by the use of a 
coating such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by 
the use of surfactants. Prevention of the action of 
microorganisms can be achieved by various antibacterial 

15 €uid antifungal agents, for example, parabens, 

chlorobutanol , phenol, ascorbic acid, thimerosal, and the 
like. In many cases, it will be preferable to include 
isotonic agents, for example, sugars, polyalcohols such 
as mannitol, sorbitol, sodium chloride in the 

20 composition. Prolonged absorption of the injectable 
conpositions can be brought about by including in the 
conposltlon an agent which delays absorption, for 
example, aluminum monostearate and gelatin* 

Sterile Injectable solutions can be prepared by 

25 Incorporating the active compotuid (e.g., a polypeptide or 
antibody) in the required amount in an appropriate 
solvent with one or a combination of ingredients 
enumerated above, as required, followed by filtered 
sterilization- Generally, dispersions are prepared by 

30 incorporating the active conpound into a sterile vehicle 
which contains a basic dispersion medium and the required 
other ingredients from those enumerated above. In the 
case of sterile powders for the preparation of sterile 
Injectable solutions, the preferred methods of 

35 preparation are vacuum drying and freeze-drylng which 
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yields a powder of the active ingredient plus any 
additional desired ingredient from a previously sterile- 
filtered solution thereof. 

Oral compositions generally include an inert diluent or 
5 an edible carrier. They can be enclosed in gelatin 

capsules or compressed into tablets. For the purpose of 
oral therapeutic administration, the active compound can 
be incorporated with excipients and used in the form of 
tablets, troches, or capsules. Oral compositions can 

10 also be prepared using a fluid carrier for use as a 

mouthwash, wherein the compound in the fluid carrier is 
applied orally and swished and expectorated or swallowed. 
Pharmaceutically conpatible binding agents, and/ or 
adjuvant materials can be included as part of the 

15 conqposition. The tablets, pills, capsules, troches and 
the like can contain any of the following ingredients, or 
compounds of a similar nature: a binder such as 
microcrystalline cellulose, gum tragaceuith or gelatin; an 
excipient such as starch or lactose, a disintegrating 

20 agent such as alginic acid, Primogel, or com starch; a 
lubricant such as magnesium stearate or Sterotes; a 
glidant such as colloidal silicon dioxide; a sweetening 
agent such as sucrose or saccharin; or a flavoring agent 
such as peppermint, methyl salicylate, or orange 

25 flavoring. 

For administration by inhalation, the compounds are 
delivered in the form of an aerosol spray from a 
pressTirized container or dispenser which contains a 
suitcible propellant, e.g., a gas such as carbon dioxide, 

30 or a nebulizer. 

Systemic administration can also be by transmucosal or 
trsmsdermal means • For transmucosal or transdermal 
administration, penetrants appropriate to the barrier to 
be permeated are used in the formulation. Such 

35 penetrants are generally known in the art, and include. 
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for example, for transmucosal administration, detergents, 
bile salts I and fusidic acid derivatives. Transmucosal 
administration can be accomplished through the use of 
nasal sprays or suppositories. For transdermal 
5 administration, the active compoiinds are formulated into 
ointments, salves, gels, or creams as generally known in 
the art. 

The conpounds can also be prepared in the form of 
suppositories (e.g., with conventional suppository bases 

10 such as cocoa butter and other glycerides) or retention 
enemas for rectal delivery. 

In one embodiment, the active compounds are prepared 
with carriers that will protect the compound against 
rapid elimination from the body, such as a controlled 

15 release formulation, including implants and 

microencapsulated delivery systems. Biodegradable, 
biocompatible polymers can be used, such as ethylene 
vinyl acetate, polyanhydrides, polyglycolic acid, 
collagen, polyorthoesters , and polylactic acid. Methods 

20 for preparation of such formulations will be apparent to 
those skilled in the art. The materials can also be 
obtained commercially from Alza Corporation and Nova 
Pharmaceuticals, Inc. Liposomal suspensions (including 
liposomes targeted to infected cells with monoclonal 

25 antibodies to viral auitigens) can also be used as 
phconnaceutically acceptable carriers. These can be 
prepared according to methods known to those skilled in 
the art, for example, as described in U.S. Patent No. 
4,522,811. 

30 It is especially advantageous to formulate oral or 
parenteral compositions in dosage unit form for ease of 
administration and uniformity of dosage. Dosage unit 
form as used herein refers to physically discrete \mits 
suited as unitary dosages for the subject to be treated; 

35 each unit containing a predetermined quantity of active 
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coinpotuid calculated to produce the desired therapeutic 
effect in association with the required pharmaceutical 
carrier. The specification for the dosage unit forms of 
the invention are dictated by and directly dependent on 
5 the unique characteristics of the active compound and the 
particular therapeutic effect to be achieved, and the 
limitations inherent in the art of confounding such an 
active compoiind for the treatment of individuals. 

For antibodies I the preferred dosage is 0.1 mg/kg to 

10 100 mg/kg of body weight (generally 10 mg/kg to 20 

mg/kg) . If the antibody is to act in the brain, a dosage 
of 50 mg/kg to 100 mg/kg is usually appropriate. 
Generally, partially human antibodies cuid fully human 
antibodies have a longer half -life within the human body 

15 than other antibodies. Accordingly, lower dosages and 
less frequent administration is often possible. 
Modifications such as lipidation can be used to stabilize 
antibodies and to enhance uptake and tissue penetration 
(e.g., into the brain) . A method for lipidation of 

20 antibodies is described by Cruikshank et al. ((1997) J". 
Acquired Immune Deficiency Syndrcmes and Human 
Setrovirology 14:193) . 

The nucleic acid molecules of the invention can be 
inserted into vectors and used as gene therapy vectors. 

25 Gene therapy vectors can be delivered to a subject by, 
for exeUTqc>le, intravenous injection, local administration 
(U.S. Patent 5,328,470) or by stereotactic injection 
(see, e.g., caien et al. (1994) Proc. Natl. Acad. Scl. USA 
91:3054-3057). The pharmaceutical preparation of the 

30 gene therapy vector can include the gene therapy vector 
in an acceptable diluent, or can conprise a slow release 
matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector 
can be produced intact from recombinant cells, e.g. 

35 retroviral vectors, the pharmaceutical prepauration can 
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include one or more cells which produce the gene delivery 
system. 

The pharmaceutical compositions can be included in a 
container, pack, or dispenser together with instructions 
5 for administration. 

V. Uses and Methods of the Invention 

The nucleic acid molecules, proteins, protein 
homologues, and antibodies described herein can be used 
in one or more of the following methods: a) screening 

10 assays; b) detection assays (e.g., chromosomal mapping, 
tissue typing, forensic biology) ; c) predictive medicine 
(e.g., diagnostic assays, prognostic assays, monitoring 
clinical trials, and pharmacogenomics) ; and d) methods of 
treatment (e.g., therapeutic and prophylactic). For 

15 example, polypeptides of the invention can to used to (1) 
modulate cellular proliferation; (ii) modulate cellular 
differentiation; and (iii) modulate cell survival. The 
isolated nucleic acid molecules of the invention can be 
used to express proteins (e.g., via a recombinant 

20 expression vector in a host cell in gene therapy 

applications), to detect toiRNh (e.g., in a biological 
sample) or a genetic lesion, and to modulate activity of 
a polypeptide of the invention. In addition, the 
polypeptides of the invention can be used to screen drugs 

25 or compovmds which modulate activity or expression of a 
polypeptide of the invention as well as to treat 
disorders characterized by insufficient or excessive 
production of a protein of the invention or production of 
a form of a protein of the Invention which has decreased 

30 or aberrant activity compared to the wild type protein. 
In addition, the antibodies of the invention can be used 
to detect and Isolate a protein of the Invention and 
modulate activity of a protein of the Invention. 
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This invention further pertains to novel agents 
identified by the edDOve- described screening assays and 
uses thereof for treatments as described herein* 



A. Screening Assays 
5 The invention provides a method (also referred to 
herein as a "screening assay") for identifying 
modulators, i.e.. candidate or test compounds or agents 
(e.g., peptides, peptidomimetics, small molecules or 
other drugs) which bind to polypeptide of the invention 
10 or have a stimulatory or inhibitory effect on, for 

example, expression or activity of a polypeptide of the 
invention. 

In one embodiment, the invention provides assays for 
screening candidate or test compounds which bind to or 

15 modulate the activity of the membrane -bound form of a 
polypeptide of the invention or biologically active 
portion thereof. The test compoxinds of the present 
invention can be obtained using any of the numerous 
approaches in combinatorial library methods known in the 

20 art, including: biological libraries; spatially 
addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring 
deconvolutlon; the "one-bead one -compound" library 
method; and synthetic library methods using affinity 

25 chromatography selection. The biological library 

approach is limited to peptide libraries, while the other 
four approaches are applicable to peptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam 
(1997) Anticancer Drag Dea. 12:145) . 

30 Examples of methods for the synthesis of molecular 
libraries ccui be found in the art, for exanple in: 
DeWitt et al. (1993) Proc. Natl. Acad. Sci. USA 90:6909; 
Brb et al. (1994) Proc. Natl. Acad. Sci. USA 91:11422; 
Zuckermann et al. (1994). J. Med. Chem. 37:2678; Cho et 
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al. (1993) Science 261:1303; Carrell et al. (1994) Angew. 
Cbem. Int. Ed. Engl. 33:2059; Carell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2061; and Gallop et al. (1994) *7. 
Med. Chem. 37:1233. 
5 Libraries of compounds may be presented in solution 
(e.g., Houghten (1992) Bio/Technigues 13 :412-421) , or on 
beads (Lam (1991) Jfeture 354 :82-84) , chips (Fodor (1993) 
Nature 364:555-556), bacteria (U.S. Patent No. 
5,223,409), spores (Patent NOS. 5,571,698; 5,403,484; and 

10 5,223,409), plasmids (Cull et al. (1992) Proc. Uatl. 
Acad. Sci. USA 89:1865-1869) or phage (Scott and Smith 
(1990) Science 249:386-390; Devlin (1990) Science 
249:404-406; Cwirla et al. (1990) Proc. Natl. Acad. Sci. 
USA 87:6378-6382; and Felici (1991) J. Mol. Biol. 

15 222:301-310). 

In one embodiment, an assay is a cell-based assay in 
which a cell which expresses a membrane -bound form of a 
polypeptide of the invention, or a biologically active 
portion thereof, on the cell surface is contacted with a 

20 test confound and the ability of the test compound to 
bind to the polypeptide determined. The cell, for 
exanqple, can be a yeast cell or a cell of mammalian 
origin. Determining the ability of the test compound to 
bind to the polypeptide can be accomplished, for example, 

25 by coupling the test compound with a radioisotope or 

enzymatic label such that binding of the test compound to 
the polypeptide or biologically active portion thereof 
can be determined by detecting the labeled compound in a 
conplex. For example, test compounds can be labeled with 

30 "^I, ^^S, "C, or ^H, either directly or indirectly, and 
the radioisotope detected by direct counting of 
radioemmission or by scintillation counting. 
Alternatively, test compounds can be enzymatically 
labeled with, for example, horseradish peroxidase, 

35 alkaline phosphatase, or luclf erase, and the enzymatic 
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label detected by determination of conversion of an 
appropriate substrate to product* In a preferred 
enibodiment, the assay conprises contacting a cell which 
expresses a membrane abound form of a polypeptide of the 
5 invention, or a biologically active portion thereof, on 
the cell surface with a known compound which binds the 
polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test conqpound to interact with the 
10 polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide comprises 
determining the ability of the test conpound to 
preferentially bind to the polypeptide or a biologically 
active portion thereof as compared to the known compound. 

15 In another embodiment, an assay is a cell -based assay 
conqprising contacting a cell expressing a membrane -bound 
form of a polypeptide of the invention, or a biologically 
active portion thereof, on the cell surface with a test 
compound and determining the eJaility of the test compoimd 

20 to modulate (e.g., stimulate or inhibit) the activity of 
the polypeptide or biologically active portion thereof. 
Oetexmining the cJ^ility of the test connpound to modulate 
the activity of the polypeptide or a biologically active 
portion thereof can be accomplished, for example, by 

25 determining the ability of the polypeptide protein to 
bind to or interact with a target nuDlecule. 

Determining the ability of a polypeptide of the 
invention to bind to or interact with a target molecule 
can be acconplished by one of the methods described above 

30 for determining direct binding. As used herein, a 
"target molecule" is a molecule with which a selected 
polypeptide (e.g., a polypeptide of the invention binds 
or interacts with in nature, for example, a molecule on 
the surface of a cell which expresses the selected 
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protein, a molecule on the surface of a second cell, a 
molecule in the extracellular milieu, a molecule 
associated with the internal surface of a cell membrane 
or a cytoplasmic molecule. A target molecule can be a 
5 polypeptide of the invention or some other polypeptide or 
protein. For example, a target molecule can be a 
conponent of a signal transduction pathway which 
facilitates transduction of an extracellular signal 
(e.g., a signal generated by binding of a compound to a 

10 polypeptide of the invention) through the cell membrane 
and into the cell or a second intercellular protein which 
has catalytic activity or a protein which facilitates the 
association of downstream signaling molecules with a 
polypeptide of the invention. Determining the ability of 

IS a polypeptide of the invention to bind to or interact 

with a target molecule can be accomplished by determining 
the activity of the target molecule. For example, the 
activity of the target molecule can be determined by 
detecting induction of a cellular second messenger of the 

20 target (e.g., intracellular Ca**, diacylglycerol , IP3, 
etc.), detecting catalytic/enzymatic activity of the 
target on an appropriate substrate, detecting the 
induction of a reporter gene (e.g., a regulatory element 
that is responsive to a polypeptide of the invention 

25 operably linked to a nucleic acid encoding a detectable 
marker, e.g. luc if erase ) , or detecting a cellular 
response, for exanqple, cellular differentiation, or cell 
proliferation. 

In yet another embodiment, an assay of the present 

30 invention is a cell -free assay comprising contacting a 
polypeptide of the invention or biologically active 
portion thereof with a test compound and determining the 
ability of the test compound to bind to the polypeptide 
or biologically active portion thereof. Binding of the 

35 test compoimd to the polypeptide can be determined either 
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directly or indirectly as described above. In a 
preferred embodiment, the assay includes contacting the 
polypeptide of the invention or biologically active 
portion thereof with a known compound which binds the 
5 polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test compound to interact with the 
polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide conprises 

10 determining the ability of the test compound to 

preferentially bind to the polypeptide or biologically 
active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell -free assay 
comprising contacting a polypeptide of the invention or 

15 biologically active portion thereof with a test compound 
and determining the ability of the test compound to 
modulate (e^g., stimulate or inhibit) the activity of the 
polypeptide or biologically active portion thereof. 
Determining the ability of the test compound to modulate 

20 the activity of the polypeptide can be accomplished, for 
example, by determining the ability of the polypeptide to 
bind to a target molecule by one of the methods described 
above for determining direct binding. In an alternative 
embodiment, determining the ability of the test compound 

25 to modulate the activity of the polypeptide can be 
accomplished by determining the ability of the 
polypeptide of the invention to further modulate the 
taixjet molecule. For example, the catalytic/enzymatic 
activity of the target molecule on an appropriate 

30 substrate can be determined as previously described. 
In yet another embodiment, the cell-free assay 
comprises contacting a polypeptide of the invention or 
biologically active portion thereof with a known compound 
which binds the polypeptide to form an assay mixture, 

35 contacting the assay mixture with a test coinpound, and 
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determining the ability of the test compound to interact 
with the polypeptide, wherein determining the ability of 
the test compound to interact with the polypeptide 
comprises determining the ability of the polypeptide to 
5 preferentially bind to or modulate the activity of a 
target molecule* 

The cell -free assays of the present invention are 
amenable to use of both a soluble form or the membrane- 
bound form of a polypeptide of the invention* In the 

10 case of cell -free assays comprising the membrane-bound 
form of the polypeptide, it may be desirable to utilize a 
solubilizing agent such that the membrane -bound form of 
the polypeptide is maintained in solution. Examples of 
such solubilizing agents include non- ionic detergents 

15 such as n-octylglucoside, n-dodecylglucoside, n- 

dodecylmaltoside , octanoyl -N-methylglucamide , decanoyl -N- 
methylglucamide, Triton X-100, Triton X-114, Thesit, 
Isotridecypoly (ethylene glycol ether) n, 3-t(3- 
cholamidopropyl ) dimethylamminio] - 1 -propane sul f onat e 

20 (CHAPS) , 3- [ (3-cholamidopropyl) dimethylamminio] -2- 

hydroxy-1 -propane sulfonate (CHAPSO) , or N-dodecyl«N,N- 
dimethyl-3-ammonio-l-prppane sulfonate. 

In more than one embodiment of the above assay methods 
of the present invention, it may be desirable to 

25 immobilize either the polypeptide of the invention or its 
target molecule to facilitate separation of complexed 
from unconplexed forms of one or both of the proteins, as 
well as to accommodate automation of the assay. Binding 
of a test compound to the polypeptide, or interaction of 

30 the polypeptide with a target molecule in the presence 
and absence of a candidate conpound, Ccin be accomplished 
in any vessel suitable for containing the reactants. 
Bxanples of such vessels include microtitre plates, test 
tubes, and micro- centrifuge tud^es. In one embodiment, a 

35 fusion protein can be provided which adds a domain that 
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allows one or both of the proteins to be bound to a 
matrix. For example, glutathione-S-transferase fusion 
proteins or glutathione-S-transferase fusion proteins can 
be adsorbed onto glutathione sepharose beads (Sigma 
5 Chemical; St. Louis, MO) or glutathione derivatized 

microtitre plates, which are then combined with the test 
compo\md or the test compound and either the non- adsorbed 
target protein or A polypeptide of the invention, and the 
mixture incubated under conditions conducive to conplex 

10 formation (e.g., at physiological conditions for salt and 
pH) - Following incubation, the beads or microtitre plate 
wells are washed to remove any unbound components and 
complex formation is measured either directly or 
indirectly, for exanple, as described above, 

15 Alternatively, the complexes can be dissociated from the 
matrix, and the level of binding or activity of the 
polypeptide of the invention can be determined using 
standard techniques - 

Other techniques for immobilizing proteins on matrices 

20 can also be used in the screening assays of the 

invention. For example, either the polypeptide of the 
invention or its target molecule can be immobilized 
utilizing conjugation of biotin and streptavidin. 
Biotinylated polypeptide of the invention or target 

25 molecules can be prepared from biotin-NHS (N-hydroxy- 
succinimide) using techniques well known in the art 
(e.g., biotinylation kit. Pierce Chemicals; Rockford, 
XL) , and immobilized in the wells of streptavidin- coated 
96 well plates (Pierce Chemical) . Alternatively, 

30 antibodies reactive with the polypeptide of the invention 
or target molecules but which do not interfere with 
binding of the polypeptide of the invention to its target 
molecule can be derivatized to the wells of the plate, 
and unbound target or polypeptidede of the invention 

35 trapped in the wells by antibody conjugation. Methods 
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for detecting such conplexes, in addition to those 
described above for the GST- immobilized complexes i 
include immunodetection of complexes using antibodies 
reactive with the polypeptide of the invention or target 
5 molecule, as well as enzyme -linked assays which rely on 
detecting an enzymatic activity associated with the 
polypeptide of the invention or target molecule . 

In another embodiment, modulators of expression of a 
polypeptide of the invention are identified in a method 

10 in which a cell is contacted with a candidate compound 
and the es^ression of the selected mRNA or protein (i.e., 
the mRMJl or protein corresponding to a polypeptide or 
nucleic acid of the invention) in the cell is detextnined. 
The level of expression of the selected mRNA or protein 

15 in the presence of the candidate compound is compared to 
the level of e^qpression of the selected mRNA or protein 
in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of 
expression of the polypeptide of the invention based on 

20 this comparison. For example, when expression of the 
selected mRNA or protein is greater (statistically 
signif iccOitly greater) in the presence of the candidate 
compound thsui in its absence, the candidate compoxmd is 
identified as a stimulator of the selected mRNA or 

25 protein expression. Alternatively, when expression of 
the selected mRNA or protein is less (statistically 
significantly less) in the presence of the candidate 
compoxind than in its absence, the candidate compoxind is 
identified as an inhibitor of the selected mRNA or 

30 protein expression* The level of the selected mRNA or 
protein expression in the cells can be determined by 
methods described herein. 

In yet another aspect of the invention, a polypeptide 
of the inventions can be used as "bait proteins" in a 

35 two-hybrid assay or three hybrid assay (see, e.g., U*S. 
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Patent No* 5,283,317; Zervos et al. (1993) Cell 72:223- 
232; Madiira et al. (1993) J. Biol. Chem. 268:12046-12054; 
Bartel et al. (1993) Bio/Tecbniqaes 14:920-924; Iwabuchi 
et al. (1993) Oncogene 8:1693-1696; and PCT Publication 
5 No. WO 94/10300), to identify other proteins, which bind 
to or interact with the polypeptide of the invention and 
modulate activity of the polypeptide of the invention- 
Such binding proteins are also likely to be involved in 
the propagation of signals by the polypeptide of the 

10 inventions as, for exaiflple, upstream or downstream 

elements of a signaling pathway involving the polypeptide 
of the invention. 

This invention further pertains to novel agents 
identified by the above -described screening assays and 

15 uses thereof for treatments as described herein. 

B. Detection Assavs 

Portions or fragments of the cDN?^ sequences identified 
herein (and the corresponding complete gene sequences) 
can be used in numerous ways as polynucleotide reagents. 

20 For exaniple, these sequences can be used to: (i) map 

their respective genes on a chromosome and, thus, locate 
gene regions associated with genetic disease; (ii) 
identify an individual from a minute biological sample 
(tissue typing) ; and (iii) aid in forensic identification 

25 of a biological sample. These applications are described 
in the subsections below. 

1. Chromoso mQ Mapping 

Once the sequence (or a portion of the sequence) of a 
gene has been isolated, this sequence can be used to map 
30 the location of the gene on a chromosome. Accordingly, 
nucleic acid molecules described herein or fragments 
thereof, can be used to map the location of the 
corresponding genes on a chromosome. The mapping of the 



wo 00/18904 



PCT/U$99/22817 



- 103 - 

sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with 
disease . 

Briefly, genes can be mapped to chromosomes by 
5 preparing PGR primers (preferably 15-25 bp in length) 
from the sequence of a gene of the invention. Computer 
analysis of the sequence of a gene of the invention can 
be used to rapidly select primers that do not span more 
than one exon in the genomic DNA, thus complicating the 

10 amplification process • These primers can then be used 
for PCR screening of somatic cell hybrids containing 
individual hixman chromosomes. Only those hybrids 
containing the human gene corresponding to the gene 
sequences will yield an amplified fragment. For a review 

15 of this technicpie, see D'Eustachio et al- ((1983) Science 
220:919-924) . 

PC!R mapping of somatic cell hybrids is a rapid 
procedure for assigning a particular sequence to a 
particular chromosome. Three or more sequences can be 

20 assigned per day using a single thermal cycler. Using 
the nucleic acid secjuences of the invention to design 
oligonucleotide primers, sublocalization can be achieved 
with panels of fragments from specific chromosomes. 
Other mapping strategies which can similarly be used to 

25 map a gene to its chromosome include in situ 

hybridization (described in Pan et al. (1990) Proc. Natl. 
Acad. Sci. tTSA 87:6223-27), pre-screening with labeled 
flow-sorted chromosomes, and pre-selection by 
hybridization to chromosome specific cDNA libraries. 

30 Fluorescence in situ hybridization (FISH) of a DNA 

sequence to a metaphase chromosomal spread can further be 
used to provide a precise chromosomal location in one 
step. For a review of this technique, see Verma et al., 
(Humeui Chromosomes: A Manual of Basic Techniques 

35 (Pergamon Press, New York, 1988)). 
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Reagents for chromosome mapping can be used 
individually to mark a single chromosome or a single site 
on that chromosome, or panels of reagents can be used for 
marking multiple sites and/or multiple chromosomes. 
5 Reagents corresponding to noncoding regions of the genes 
actually are preferred for mapping purposes. Coding 
sequences are more likely to be conserved within gene 
feunilies, thus increasing the chance of cross 
hybridizations during chromosomal mapping. 

10 Once a sequence has been mapped to a precise 
chromosomal location, the physical position of the 
sequence on the chromosome can be correlated with genetic 
map data. (Such data are found, for example, in V. 
McKusick, Mendelian Inheritance in Man, available on-line 

15 through Johns Hopkins University Welch Medical Library) . 
The relationship between genes and disease, mapped to the 
same chromosomal region, can then be identified through 
linkage analysis (co- inheritance of physically adjacent 
genes), described in, e.g., Egeland et al. (1987) Nature 

20 325:783-787. 

Moreover, differences in the DNA sequences between 
individuals affected and \uiaf f ected with a disease 
associated with a gene of the invention can be 
determined. If a mutation is observed in some or all of 

25 the affected individuals but not in any unaffected 
individuals, then the mutation is likely to be the 
causative agent of the particular disease. Comparison of 
affected and unaffected individuals generally involves 
first looking for structural alterations in the 

30 chromosomes such as deletions or translocations that are 
visible from chromosome spreads or detectcQ^le using PGR 
based on that DISK sequence. Ultimately, complete 
sequencing of genes from several individuals can be 
performed to confirm the presence of a mutation and to 

35 distinguish mutations from polymorphisms. 



wo 00^8904 



PCT/US99/22817 



- 105 - 

2. Tiaaue Tvnina 

the nucleic acid sequences of the present invention can 
also be used to identify individuals from minute 
biological samples. The United States military, for 
5 example, is considering the use of restriction fragment 
length polymorphism (RPLP) for identification of its 
personnel, in this technique, an individual's genomic 
DMA is digested with one or more restriction enzymes, and 
probed on a Southern blot to yield unique bands for 
10 identification. This method does not suffer from the 
current limitations of "Dog Tags" which can be lost, 
switched, or stolen, making positive identification 
difficult. The sequences of the present invention are 
useful as additional DNA markers for RPLP (described in 
15 U.S. Patent 5,272,057). 

Furthermore, the sequences of the present invention can 
be used to provide an alternative technique which 
determines the actual base -by-base DNA sequence of 
selected portions of an individual's genome. Thus, the 
20 nucleic acid sequences described herein can be used to 
prepare two PGR primers from the 5' and 3' ends of the 
sequences. These primers can then be used to amplify an 
individtial's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, 
25 prepared in this manner, can provide unique individual 
identifications, as each individual will have a unique 
set of such DNA sequences due to allelic differences. 
The sequences of the present invention can be used to 
obtain such identification sequences from individuals and 
30 from tissue. The nucleic acid sequences of the invention 
uniquely represent portions of the human genome. Allelic 
variation occurs to some degree in the coding regions of 
these sequences, and to a greater degree in the noncoding 
regions. It is estimated that allelic variation between 
35 individual humans occurs with a frequency of about once 
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per each 500 bases. Bach of the sequences described 
herein can, to some degree, be used as a standard against 
which DNA from an individual can be conpared for 
identification purposes. Because greater numbers of 
5 polymorphisms occur in the noncoding regions, fewer 
sequences are necessary to differentiate individuals. 
For example, the noncoding sequences of SEQ ID N0:1 can 
comfortably provide positive individual identification 
with a panel of perhaps 10 to 1,000 primers which each 

10 yield a noncoding anplified sequence of 100 bases. If 
predicted coding sequences, such as those in SEQ ID NO: 3 
are used, a more appropriate nianber of primers for 
positive individual identification would be 500-2,000. 
If a panel of reagents from the nucleic acid sequences 

15 described herein is used to generate a unique 

identification database for an individual, those same 
reagents can later be used to identify tissue from that 
individual. Using the unique identification database, 
positive identification of the individual, living or 

20 dead, can be made from extremely small tissue samples* 

2. Use of Pa rtial Gene Sequences in Forensic Biolodv 
DNA-based identification techniques can also be used in 
forensic biology. Forensic biology is a scientific field 
enploying genetic typing of biological evidence found at 

25 a crime scene as a means for positively identifying, for 
example, a perpetrator of a crime. To make such an 
identification, PGR technology can be used to amplify DNA 
sequences taken from very small biological samples such 
as tissues, e.g., hair or skin, or body fluids, e.g., 

30 blood, saliva, or semen found at a crime scene. The 
amplified sequence can then be compared to a standard, 
thereby allowing identification of the origin of the 
biological sample. 
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The sequences of the present invention can be used to 
provide polynucleotide reagents, e.g., PCR primers, 
targeted to specific loci in the human genome, which can 
enhance the reliability of DNA-based forensic 
5 identifications by, for exanqple, providing another 

"identification marker" (i.e. another DNA sequence that 
is unique to a particular individual) . As mentioned 
above, actual base sequence information can be used for 
identification as an accurate alternative to patterns 

10 formed by restriction enzyme generated fragments. 

Sequences targeted to noncoding regions are particularly 
appropriate for this use as greater numbers of 
polymorphisms occur in the noncoding regions, making it 
easier to differentiate individuals using this technique. 

15 Examples of polynucleotide reagents include the nucleic 
acid sequences of the invention or portions thereof, 
e.g., fragments derived from noncoding regions having a 
length of at least 20 or 30 bases. 

The nucleic acid sequences described herein can further 

20 be used to provide polynucleotide reagents, e.g., labeled 
or labelable probes which can be used in, for example, an 
in situ hybridization technique, to identify a specific 
tissue, e.g., brain tissue. This can be very useful in 
cases where a forensic pathologist is presented with a 

25 tissue of unknown origin. Peuiels of such probes ceui be 
used to identify tissue by species and/or by organ type. 

C. Predictive Medicine 

The present invention also pertains to the field of 
predictive medicine in which diagnostic assays, 
30 prognostic assays, pharmacogenomics, cuxd monitoring 
clinical trails are used for prognostic (predictive) 
purposes to thereby treat an individual prophylactically. 
Accordingly, one aspect of the present invention relates 
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to diagnostic assays for determining expression of a 
polypeptide or nucleic acid of the invention and/or 
activity of a polypeptide of the invention, in the 
context of a biological sample (e.g., blood, serum, 
5 cells, tissue) to thereby determine whether an individual 
is afflicted with a disease or disorder, or is at risk of 
developing a disorder, associated with aberrant 
expression or activity of a polypeptide of the invention. 
The invention also provides for prognostic (or 

10 predictive) assays for determining whether an individual 
is at risk of developing a disorder associated with 
a]3errant expression or activity of a polypeptide of the 
invention. For example, mutations in a gene of the 
invention can be assayed in a biological sample. Such 

15 assays can be used for prognostic or predictive purpose 
to thereby prophylactically treat an individual prior to 
the onset of a disorder characterized by or associated 
with aberrant egression or activity of a polypeptide of 
the invention. 

20 Another aspect of the invention provides methods for 
expression of a nucleic acid or polypeptide of the 
invention or activity of a polypeptide of the invention 
in an individual to thereby select appropriate 
therapeutic or prophylactic agents for that individual 

25 (referred to herein as "pharmacogenomics") • 

Pharmacogenomics allows for the selection of agents 
(e.g., drugs) for therapeutic or prophylactic treatment 
of an individual based on the genotype of the individual 
(e.g., the genotype of the individual examined to 

30 determine the ability of the individual to respond to a 
particular agent) . 

Yet another aspect of the invention pertains to 
monitoring the influence of agents (e,g., drugs or other 
coirqpounds) on the esqpression or activity of a polypeptide 

35 of the invention in clinical trials. 
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These and other agents are described in further detail 
in the following sections* 



1. Diagnostic Assays 

An exemplary method for detecting the presence or 
5 absence of a polypeptide or nucleic acid of the invention 
in a biological sample involves obtaining a biological 
sample from a test stibject and contacting the biological 
sample with a compound or an agent capable of detecting a 
polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 

10 the invention such that the presence of a polypeptide or 
nucleic acid of the invention is detected in the 
biological sample. A preferred agent for detecting mRNA 
or genomic DNA encoding a polypeptide of the invention is 
a labeled nucleic acid probe capable of hybridizing to 

15 mRNA or genomic DNA encoding a polypeptide of the 

invention. The nucleic acid probe can be, for example, a 
full-length cDNA, such as the nucleic acid of SEQ ID 

NOs:l-22, 34-43, and - or a portion thereof, such 

as an oligonucleotide of at least 15, 30, 50, 100, 250 or 

20 500 nucleotides in length and sufficient to specifically 
hybridize tmder stringent conditions to a mRNA or genomic 
DNA encoding a polypeptide of the invention. Other 
suitable probes for use in the diagnostic assays of the 
invention eure described herein. 

25 A preferred agent for detecting A polypeptide of the 
invention is an antibody capable of binding to A 
polypeptide of the invention, preferably an antibody with 
a detectable label* Antibodies can be polyclonal, or 
more preferably, monoclonal. An intact antibody, or a 

30 fragment thereof (e.g., Pab or P(abM2) can be used. The 
term "labeled", with regard to the probe or antibody, is 
intended to encompass direct labeling of the probe or 
antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as 
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Indirect labeling of the probe or antibody by reactivity 
with another reagent that is directly labeled. Examples 
of indirect labeling include detection of a primary 
antibody using a f luorescently labeled secondary antibody 
5 and end- labeling of a DNA probe with biotin such that it 
can be detected with f luorescently labeled streptavidin. 
The term "biological sample" is intended to include 
tissues, cells and biological fluids isolated from a 
subject, as well as tissues, cells and fluids present 

10 within a subject* That is, the detection method of the 
invention can be used to detect mRMA, protein, or genomic 
DNA in a biological sanple in vitro as well as in vivo. 
For example, in vitro techniques for detection of mRNA. 
include Northern hybridizations cuid in situ 

15 hybridizations. In vitro techniques for detection of A 
polypeptide of the invention include enzyme linked 
immunosorbent assays (ELISAs) , Western blots, 
imraunoprecipitations and immunofluorescence. In vitro 
techniques for detection of genomic DNA include Southern 

20 hybridizations. Furthermore, in vivo techniques for 
detection of a polypeptide of the invention include 
introducing into a subject a labeled antibody directed 
against the polypeptide. For example, the antibody can 
be labeled with a radioactive marker whose presence and 

25 location in a stibject can be detected by standard imaging 
techniques • 

In one embodiment, the biological sample contains 
protein molecules from the test subject* Mtematively, 
the biological sanqple can contain mRNA molecules from the 
30 test subject or genomic DNA molecules from the test 

subject* A preferred biological sanple is a peripheral 
blood leukocyte sample isolated by conventional means 
from a subject. 

In another embodiment, the methods further involve 
35 obtaining a control biological sample from a control 



wo 00/18904 



PCTAJS99/22817 



- Ill - 

subject, contacting the control sample with a compound or 
agent capable of detecting a polypeptide of the invention 
or niRNA or genomic DNA encoding a polypeptide of the 
invention^ such that the presence of the polypeptide or 
5 mRNA or genomic DNA encoding the polypeptide is detected 
in the biological sample, and comparing the presence of 
the polypeptide or mRNA. or genomic DNA encoding the 
polypeptide in the control sample with the presence of 
the polypeptide or mRNA or genomic DNA encoding the 

10 polypeptide in the test sample. 

The invention also encompasses kits for detecting the 
presence of a polypeptide or nucleic acid of the 
invention in a biological sample (a test sample) . Such 
kits can be used to determine if a subject is suffering 

15 from or is at increased risk of developing a disorder 
associated with eiberrant expression of a polypeptide of 
the invention (e.g., an immunological disorder). For 
exan^^le, the kit can comprise a labeled compound or agent 
capable of detecting the polypeptide or mRNA encoding the 

20 polypeptide in a biological sample and means for 

determining the amount of the polypeptide or mRNA in the 
sample (e.g», an antibody which binds the polypeptide or 
an oligonucleotide probe which binds to DNA or mRNA 
encoding the polypeptide) . Kits can also include 

25 instruction for observing that the tested subject is 
suffering from or is at risk of developing a disorder 
associated with aberrant expression of the polypeptide if 
the amount of the polypeptide or mRNA encoding the 
polypeptide is above or below a normal level, 

30 For antibody-based kits, the kit can comprise, for 

example: (1) a first antibody (e.g., attached to a solid 
support) which binds to a polypeptide of the invention; 
and, optionally, (2) a second, different antibody which 
binds to either the polypeptide or the first cuitibody and 

35 is conjugated to a detectable agent. 



wo 00/18904 



PCTAJS99/2a817 



- 112 - 

For oligonucleotide-based kits, the kit can conprise, 
for example: (1) an oligonucleotide, e.g., a detectably 
labeled oligonucleotide, which hybridizes to a nucleic 
acid sequence encoding a polypeptide of the invention or 
5 (2) a pair of primers useful for amplifying a nucleic 
acid molecule encoding a polypeptide of the invention. 

The kit can also comprise, e.g., a buffering agent, a 
preservative, or a protein stabilizing agent. The kit 
can also comprise components necessary for detecting the 

10 detectable agent (e.g., an enzyme or a substrate). The 
kit can also contain a control sample or a series of 
control samples which can be assayed and compared to the 
test sanple contained. Each component of the kit is 
usually enclosed within an individual container and all 

15 of the various containers are within a single package 
along with instructions for observing whether the tested 
subject is suffering from or is at risk of developing a 
disorder associated with aberrant expression of the 
polypeptide. 

20 2. Prognosti c Assavs 

The methods described herein can furtheinciore be 
utilized as diagnostic or prognostic assays to identify 
subjects having or at risk of developing a disease or 
disorder associated with aberrant e3q>ression or activity 

25 of a polypeptide of the invention. For example, the 

assays described herein, such as the preceding diagnostic 
assays or the following assays, can be utilized to 
identify a subject having or at risk of developing a 
disorder associated with aberrant expression or activity 

30 of a polypeptide of the invention. Alternatively, the 
prognostic assays can be utilized to identify a subject 
having or at risk for developing such a disease or 
disorder. Thus, the present invention provides a method 
in which a test sample is obtained from a subject and a 
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polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 
the invention is detected, wherein the presence of the 
polypeptide or nucleic acid is diagnostic for a subject 
having or at risk of developing a disease or disorder 
5 associated with cUDerrant expression or activity of the 
polypeptide. As used herein, a "test sample" refers to a 
biological sainple obtained from a subject of interest. 
For example, a test sample can be a biological fluid 
(e.g., serum), cell san^le, or tissue. 

10 Furthermore, the prognostic assays described herein can 
be used to determine whether a subject can be 
administered an agent (e.g., an agonist, antagonist, 
peptidomimetic, protein, peptide, nucleic acid, small 
molecule, or other drug candidate) to treat a disease or 

15 disorder associated with aberrant expression or activity 
of a polypeptide of the invention. For exanple, such 
methods can be used to determine whether a subject can be 
effectively treated with a specific agent or class of 
agents (e.g., agents of a type which decrease activity of 

20 the polypeptide) . Thus, the present invention provides 
methods for detennining whether a sxibject can be 
effectively treated with an agent for a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention in which a test sample is 

25 obtained and the polypeptide or nucleic acid encoding the 
polypeptide is detected (e.g., wherein the presence of 
the polypeptide or nucleic acid is diagnostic for a 
subject that can be administered the agent to treat a 
disorder associated with aberrant expression or activity 

30 of the polypeptide) . 

The methods of the invention can also be used to detect 
genetic lesions or mutations in a gene of the invention, 
thereby determining if a subject with the lesioned gene 
is at risk for a disorder characterized aberrant 

35 expression or activity of a polypeptide of the invention. 
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In preferred embodiments, the methods include detecting, 
in a sample of cells from the subject, the presence or 
cdDsence of a genetic lesion or mutation characterized by 
at least one of an alteration affecting the integrity of 
5 a gene encoding the polypeptide of the invention, or the 
mis -expression of the gene encoding the polypeptide of 
the invention • For example, such genetic lesions or 
mutations can be detected by ascertaining the existence 
of at least one of: Da deletion of one or more 

10 nucleotides from the gene; 2) an addition of one or more 
nucleotides to the gene; 3) a substitution of one or more 
nucleotides of the gene; 4) a chromosomal rearrangement 
of the gene; 5) an alteration in the level of a messenger 
RNA transcript of the gene; 6) an aberrant modification 

15 of the gene, such as of the methylation pattern of the 
genomic DNA; 7) the presence of a non-wild type splicing 
pattern of a messenger RNA transcript of the gene; 8) a 
non-wild type level of a the protein encoded by the gene; 
9) an allelic loss of the gene; and 10) an inappropriate 

20 post-translational modification of the protein encoded by 
the gene. As described herein, there are a large number 
of assay techniques known in the art which can be used 
for detecting lesions in a gene. 

In certain embodiments, detection of the lesion 

25 involves the use of a probe/primer in a polymerase chain 
reaction (PGR) (see, e.g., U.S. Patent Nos. 4,683,195 and 
4,683,202), such as anchor PGR or RACE PGR, or, 
alternatively, in a ligation chain reaction (LCR) (see, 
e,g., Landegran et al. (1988) Science 241:1077-1080; and 

30 Nakazawa et al. (1994) Proc. Natl. Acad. Sci. USA 91:360- 
364), the latter of which can be particularly useful for 
detecting point mutations in a gene (see, e.g., Abravaya 
et al. (1995) Nucleic Acids Res. 23:675-682). This 
method can include the steps of collecting a sample of 

35 cells from a patient, isolating nucleic acid (e.g., 
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genomic, mRNA or both) from the cells of the sample, 
contacting the nucleic acid sample with one or more 
primers which specifically hybridize to the selected gene 
under conditions such that hybridization and 
5 amplification of the gene (if present) occurs, and 
detecting the presence or eibsence of an amplification 
product, or detecting the size of the amplification 
product and comparing the length to a control sample. It 
is anticipated that PGR and/or LCR may be desirable to 

10 use as a preliminary amplification step in conjunction 
with any of the techniques used for detecting mutations 
described herein. 

Alternative amplification methods include: self 
sustained sequence replication (Guatelli et al. (1990) 

15 Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional 
amplification system (Kwoh, et al. (1989) Proc. Natl. 
Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi 
et al. (1988) Bio/Technology 6:1197), or any other 
nucleic acid amplification method, followed by the 

20 detection of the anplified molecules using techniques 

well known to those of skill in the art. These detection 
schemes are especially useful for the detection of 
nucleic acid molecules if such molecules are present in 
very low nximbers. 

25 In an alternative embodiment, mutations in a selected 
gene from a sample cell can be identified by alterations 
in restriction enzyme cleavage patterns. For example, 
sample and control DNA is isolated, amplified 
(optionally) , digested with one or more restriction 

30 endonucleaaes, euid fragment length sizes are determined 
by gel electrophoresis and con^ared. Differences in 
fragment length sizes between sample and control DNA 
indicates mutations in the sanple DNA. Moreover, the use 
of sequence specific ribozymes (see, e.g., U.S. Patent 

35 No» 5,498,531) can be used to score for the presence of 
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specific mutations by development or loss of a ribozyme 
cleavage site. 

In other embodiments , genetic mutations cam be 
identified by hybridizing a sample and control nucleic 
5 acids, e.g., DNA or RNA, to high density arrays 

containing htindreds or thousands of oligonucleotides 
probes (Cronin et al. (1996) Human Mutation 7:244-255; 
Kozal et al- (1996) Nature Medicine 2:753-759) . For 
example, genetic mutations can be identified in two- 

10 dimensional arrays containing light -generated DNA probes 
as described in Cronin et al., supra. Briefly, a first 
hybridization array of probes can be used to scan through 
long stretches of DNA in a sample and control to identify 
base changes between the sequences by making linear 

15 arrays of sequential overlapping probes. This step 

allows the identification of point mutations. This step 
is followed by a second hybridization array that allows 
the characterization of specific mutations by using 
smaller, specialized probe arrays conqplementary to all 

20 variants or mutations detected. Each mutation array is 
composed of parallel probe sets, one complementary to the 
wild-type gene and the other conqplementary to the mutant 
gene. 

In yet another embodiment, any of a variety of 
25 sequencing reactions known in the art can be used to 

directly sequence the selected gene auid detect mutations 
by conparing the sequence of the sample nucleic acids 
with the corresponding wild- type (control) sequence. 
Examples of sequencing reactions include those based on 
30 techniques developed by Maxim and Gilbert ( (1977) Proc. 
Natl. Acad. Sci. USA 74:560) or Sanger ((1977) Proc, 
Natl. Acad. Sci. USA 74:5463). It is also contemplated 
that any of a variety of automated sequencing procedures 
can be utilized when performing the diagnostic assays 
35 ((1995) Bio/Tecbniquea 19:448), including sequencing by 
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mass spectrometry (see, e.g., PCT Publication No. WO 
94/16101; Cohen et al. (1996) Adv. Chramatogr. 36:127- 
162; and Griffin et al. (1993) Appl. Biochem. Biotechnol. 
38:147-159) . 

5 Other methods for detecting mutations in a selected 
gene include methods in which protection from cleavage 
agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA heteroduplexes (Myers et al. (1985) Science 
230:1242). In general, the technique of ^'mismatch 

10 cleavage* entails providing heteroduplexes formed by 

hybridizing (labeled) RNA or DNA containing the wild- type 
secpience with potentially mutant RNA or DNA obtained from 
a tissue sample. The double -stranded duplexes are 
treated with an agent which cleaves single-stranded 

15 regions of the duplex such as which will exist due to 
basepair mismatches between the control and sample 
strands. RNA/DNA duplexes can be treated with RNase to 
digest mismatched regions, and DNA/DNA hybrids can be 
treated with Si nuclease to digest mismatched regions . 

20 In other embodiments, either DNA/DNA or RNA/DNA duplexes 
can be treated with hydroxylamine or osmiium tetroxide and 
with piperidine in o3rder to digest mismatched regions. 
After digestion of the mismatched regions, the resulting 
material is then separated by size on denatiiring 

25 polyacrylamide gels to determine the site of mutation. 
See, e.g.. Cotton et al. (1988) Proc. Natl. Acad. Sci. 
USA 85:4397; Saleeba et al. (1992) Methods finzymol. 
217:286-295. In a preferred embodiment, the control DNA 
or RNA can be labeled for detection. 

30 In still another embodiment, the mismatch cleavage 
reaction employs one or more proteins that recognize 
mismatched base pairs in double -stranded DNA (so called 
^DNA mismatch repair' enzymes) in defined systems for 
detecting and mapping point mutations in cDNAs obtained 

35 from samples of cells. For example, the mutY enzyme of 
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coli cleaves A at 6/A mismatches and the thymidine DNA 
glycosylase from HeLa cells cleaves T at G/T mismatches 
(Hsu et al. (1994) CarcinogenesxB 15:1657-1662). 
According to an exemplary embodiment, a probe based on a 
5 selected sequence, e.g., a wild-type sequence, is 
hybridized to a cDNA or other DNA product from a test 
cell (s) . The duplex is treated with a DNA mismatch 
repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. 

10 See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic 
mobility will be used to identify mutations in genes. 
For example, single strand conformation polymorphism 
(SSCP) may be used to detect differences in 

15 electrophoretic mobility between mutant and wild type 

nucleic acids (Orita et al. (1989) Proc. Natl. Acad, Sci. 
USA 86:2766; see alBO Cotton (1993) Mutat. -Res. 285:125- 
144; Hayashi (1992) Genet. Anal. Tech. Appl. 9:73-79). 
Single-Stranded DNA fragments of sanple and control 

20 nucleic acids will be denatured and allowed to renature. 
The secondary structure of single -stranded nucleic acids 
varies according to sequence, and the resulting 
alteration in electrophoretic mobility enaibles the 
detection of even a single base change. The DNA 

25 fragments may be labeled or detected with labeled probes. 
The sensitivity of the assay may be enhanced by using RNA 
(rather than DNA) , in which the secondary stiructure is 
more sensitive to a change in sequence. In a preferred 
embodiment, the subject method utilizes heteroduplex 

30 analysis to separate doiible stranded heteroduplex 
molecules on the basis of changes in electrophoretic 
mobility (Keen et al. (1991) Trends Genet. 7:5). 

In yet another embodiment, the movement of mutant or 
wild-type fragments in polyacryl amide gels containing a 

35 gradient of denaturant is assayed using denaturing 
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gradient gel electrophoresis (DGGE) (Myers et al. (1985) 
Nature 313:495) . When DGGE is used as the method of 
analysis, DNA will be modified to insure that it does not 
completely denature, for example by adding a 'GC clamp of 
5 approximately 40 bp of high-melting GC-rich DNA by PGR, 
In a further embodiment, a tenqperature gradient is used 
in place of a denaturing gradient to identify differences 
in the mobility of control and sample DNA (Rosenbaum and 
Reissner (1987) Biophys. Chem, 265:12753). 

10 Examples of other techniques for detecting point 
mutations include, but are not limited to, selective 
oligonucleotide hybridization, selective amplification, 
or selective primer extension. For example, 
oligonucleotide primers may be prepared in which the 

15 known mutation is placed centrally and then hybridized to 
target DNA under conditions which permit hybridization 
only if a perfect match is found (Saiki et al. (1986) 
Nature 324:163); Saikl et al. (1989) Proc. Natl. Acad. 
Sci. USA 86:6230). Such allele specific oligonucleotides 

20 are hybridized to PGR anplified target DNA or a number of 
different mutations when the oligonucleotides are 
attached to the hybridizing membrane and hybridized with 
labeled target DNA. 

Alternatively, allele specific anplif ication technology 

25 which depends on selective PGR amplification may be used 
in conjunction with the instant invention. 
Oligonucleotides used as primers for specific 
arr?>lification may carry the mutation of interest in the 
center of the molecule (so that anplification depends on 

30 differential hybridization) (Gibbs et al. (1989) Nucleic 
AcidB Res. 17:2437-2448) or at the extreme 3' end of one 
primer where, under appropriate conditions, mismatch can 
prevent or reduce polymerase extension (Prossner (1993) 
ribtech 11:238). In addition, it may be desirable to 

35 introduce a novel restriction site in the region of the 
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mutation to create cleavage -based detection (Gasparini et 
al. (1992) Mol. Cell Probes 6:1). It is anticipated that 
in certain embodiments amplification may also be 
performed using Taq ligase for amplification (Barany 
5 (1991) Proc. Natl. Acad. Sex, USA 88:189), In such 
cases, ligation will occur only if there is a perfect 
match at the 3' end of the 5' sequence making it possible 
to detect the presence of a known mutation at a specific 
site by looking for the presence or absence of 

10 amplification. 

The methods described herein may be performed, for 
example, by utilizing pre-packaged diagnostic kits 
coirprising at least one probe nucleic acid or antibody 
reagent described herein, which may be conveniently used, 

15 e,g*, in clinical settings to diagnose patients 

exhibiting symptoms or family history of a disease or 
illness involving a gene encoding a polypeptide of the 
invention* 

Furthermore, any cell type or tissue, preferably 
20 peripheral blood leukocytes, in which the polypeptide of 
the invention is expressed may be utilized in the 
prognostic assays described herein. 

3. Pharmacoaenomics 

25 Agents, or modulatoriB which have a stimulatory or 
inhibitory effect on activity or expression of a 
polypeptide of the invention as identified by a screening 
assay described herein can be administered to individuals 
to treat (prophylactically or therapeutically) disorders 

30 associated with aberramt activity of the polypeptide. In 
conjunction with such treatment, the pharmacogenomics 
(i.e., the study of the relationship between an 
individual's genotype and that individual's response to a 
foreign compound or dsnig) of the individual may be 

35 considered. Differences in metabolism of therapeutics 
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can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood 
concentration of the pharmacologically active drug. Thus, 
the pharmacogenondcs of the individual permits the 
5 selection of effective agents (e,g., drugs) for 
prophylactic or therapeutic treatments based on a 
consideration of the individual's genotype. Such 
pharmacogenomics can further be used to determine 
appropriate dosages and therapeutic regimens. 

10 Accordingly, the activity of a polypeptide of the 
invention, expression of a nucleic acid of the 
invention, or mutation content of a gene of the invention 
in an individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

15 treatment of the individual. 

Pharmacogenomics deals with clinically significant 
hereditary variations in the response to drugs due to 
altered drug disposition and abnormal action in affected 
persons. See, e.g., Linder (1997) Clin. Cbem, 43(2) :254- 

20 266, In general, two types of pharmacogenetic conditions 
can be differentiated. Genetic conditions transmitted as 
a single factor altering the way drugs act on the body 
are referred to as "altered drug action." Genetic 
conditions transmitted as single factors altering the way 

25 the body acts on drugs are referred to as "altered drug 
meteUbolism** . These pharmacogenetic conditions can occur 
either as rare defects or as polymorphisms. For example, 
glucose -6 -phosphate dehydrogenase deficiency (G6PD) is a 
common inherited enzymopathy in which the main clinical 

30 complication is haemolysis after ingestion of oxidant 
drugs (anti-malarials, sulfonamides, analgesics, 
nitrofurans) and consumption of fava beans. 

As an illustrative embodiment, the activity of drug 
metabolizing enzymes is a major determinant of both the 

35 intensity and duration of drug action. The discovery of 
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genetic polymorphisms of drug metabolizing enzymes (e.g,, 
N-acetyl transferase 2 (NAT 2) and cytochrome P450 enzymes 
CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or 
5 show exaggerated drug response and serious toxicity after 
taking the standard and safe dose of a drug. These 
polymorphisms are expressed in two phenotypes in the 
population, the extensive metabolizer (EM) and poor 
metabolizer (PM) - The prevalence of PM is different 

10 among different populations* For example, the gene 
coding for CYP2D6 is highly polymorphic £uid several 
mutations have been identified in PM, which all lead to 
the absence of functional CYP2D6. Poor metabolizers of 
Cyp2D6 and cyP2C19 quite frequently experience 

15 exaggerated dirug response and side effects when they 
receive standard doses. If a metabolite is the active 
therapeutic moiety, a PM will show no therapeutic 
response, as demonstrated for the analgesic effect of 
codeine mediated by its CYP2D6- formed metabolite 

20 morphine. The other extreme are the so called ultra- 
rapid metabolizers who do not respond to standard doses. 
Recently, the molecular basis of ultra-rapid meted)olism 
has been identified to be due to CYP2D6 gene 
ampl i f ica t ion • 

25 Thus, the activity of a polypeptide of the invention, 
expression of a nucleic acid encoding the polypeptide, or 
mutation content of a gene encoding the polypeptide in an 
individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

30 treatment of the individual- In addition, 

pharmacogenetic studies can be used to apply genotyping 
of polymorphic alleles encoding drug-metabolizing enzymes 
to the identification of an individual's drug 
responsiveness phenotype. This knowledge, when applied 

35 to dosing or drug selection, can avoid adverse reactions 
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or therapeutic failiire and thus enhance therapeutic or 
prophylactic efficiency when treating a subject with a 
modulator of activity or expression of the polypeptide, 
such as a modulator identified by one of the exemplary 
5 screening assays described herein ♦ 

4. Monitoring of Effects During Clinical Trials 
Monitoring the influence of agents (e.g., drugs, 
compounds) on the expression or activity of a polypeptide 
of the invention (e.g., the ability to modulate aberrant 

10 cell proliferation and/or differentiation) can be applied 
not only in basic drug screening, but also in clinical 
trials. For example, the effectiveness of an agent, as 
determined by a screening assay as described herein, to 
increase gene expression, protein levels or protein 

15 activity, Ccin be monitored in clinical trials of subjects 
exhibiting decreased gene expression, protein levels, or 
protein activity. Alternatively, the effectiveness of an 
agent, as detesnnined by a screening assay, to decrease 
gene expression, protein levels or protein activity, can 

20 be monitored in clinical trials of subjects exhibiting 
increased gene expression, protein levels, or protein 
activity. In such clinical trials, esqpression or 
activity of a polypeptide of the invention and 
preferably, that of other polypeptide that have been 

25 implicated in for example, a cellular proliferation 
disorder, can be used as a marker of the immune 
responsiveness of a particular cell. 

For example, and not by way of limitation, genes, 
including those of the invention, that are modulated in 

30 cells by treatment with an agent (e.g., compound, drug or 
small molecule) which modulates activity or expression of 
a polypeptide of the invention (e.g., as identified in a 
screening assay described herein) can be identified. 
Thus, to study the effect of agents on cellular 
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proliferation disorders, for exanple, in a clinical 
trial, cells can be isolated and RNA prepared and 
analyzed for the levels of expression of a gene of the 
invention and other genes implicated in the disorder. 
5 The levels of gene expression (i.e., a gene expression 
pattern) can be quantified by Northern blot analysis or 
RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the 
methods as described herein, or by measuring the levels 

10 of activity of a gene of the invention or other genes. 
In this way, the gene expression pattern can serve as a 
marker, indicative of the physiological response of the 
cells to the agent. Accordingly, this response state may 
be determined before, and at various points during, 

15 treatment of the individual with the agent. 

In a preferred embodiment, the present invention 
provides a method for monitoring the effectiveness of 
treatment of a subject with an agent (e.g., an agonist, 
antagonist, peptidomimetic, protein, peptide, nucleic 

20 acid, small molecule, or other drug candidate identified 
by the screening assays described herein) comprising the 
steps of (i) obtaining a pre-administration sample from a 
subject prior to administration of the agent; (ii) 
detecting the level of the polypeptide or nucleic acid of 

25 the invention in the preadministration sample; (iii) 
obtaining one or more post -administration sanples from 
the sTJbject; (iv) detecting the level the of the 
polypeptide or nucleic acid of the invention in the post- 
administration samples; (v) comparing the level of the 

30 polypeptide or nucleic acid of the invention in the pre- 
administration sample with the level of the polypeptide 
or nucleic acid of the invention in the post- 
administration sample or samples; and (vi) altering the 
administration of the agent to the subject accordingly. 

35 For example, increased administration of the agent may be 
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desirable to increase the expression or activity of the 
polypeptide to higher levels than detected, i.e., to 
increase the effectiveness of the agent. Alternatively, 
decreased administration of the agent may be desirable to 
5 decrease expression or activity of the polypeptide to 
lower levels than detected, i.e., to decrease the 
effectiveness of the agent. 

C. Methods of Treatment 

The present invention provides for both prophylactic 
10 and therapeutic methods of treating a stibject at risk of 
(or susceptible to) a disorder or having a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention. 

1, Prophylactic Methods 

15 In one aspect, the invention provides a method for 
preventing in a subject, a disease or condition 
associated with an aberrant expression or activity of a 
polypeptide of the invention, by administering to the 
subject an agent which modulates expression or at least 

20 one activity of the polypeptide. Subjects at risk for a 
disease which is caused or contributed to by aberrant 
es^ression or activity of a polypeptide of the invention 
can be identified by, for example, any or a combination 
of diagnostic or prognostic assays as described herein. 

25 Administration of a prophylactic agent can occur prior to 
the manifestation of synptoms characteristic of the 
aberrancy, such that a disease or disorder is prevented 
or, alternatively, delayed in its progression. Depending 
on the type of aberrancy, for example, an agonist or 

30 antagonist agent can be used for treating the s\ibject. 
The appropriate agent can be determined based on 
screening assays described herein. 
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2. Therapeutic Methods 

Another aspect o£ the invention pertains to methods of 
modulating expression or activity of a polypeptide of the 
invention for therapeutic purposes. The modulatory 
5 method of the invention involves contacting a cell with 
an agent that modulates one or more of the activities of 
the polypeptide. An agent that modulates activity can be 
an agent as described herein, such as a nucleic acid or a 
protein, a naturally-occurring cognate ligand of the 

10 polypeptide, a peptide, a peptidomimetic, or other small 
molecule. In one embodiment, the agent stimulates one or 
more of the biological activities of the polypeptide. 
Examples of such stimiilatory agents include the active 
polypeptide of the invention and a nucleic acid molecule 

15 encoding the polypeptide of the invention that has been 
introduced into the cell. In another embodiment, the 
agent inhibits one or more of the biological activities 
of the polypeptide of the invention. Examples of such 
inhibitory agents include antisense nucleic acid 

20 molecules and antibodies. These modulatory methods can 
be performed in vitro (e.g., by culturing the cell with 
the agent) or, alternatively, in vivo (e.g, by 
administering the agent to a subject) . As such, the 
present invention provides methods of treating an 

25 individual afflicted with a disease or disorder 
characterized by aberrant expression or activity a 
polypeptide of the invention. In one embodiment, the 
method involves administering an agent (e.g., an agent 
identified by a screening assay described herein) , or 

30 combination of agents that modulates (e.g., upregulates 
or downregulates) expression or activity. In another 
embodiment, the method involves administering a 
polypeptide of the invention or a nucleic acid molecule 
of the invention as therapy to condensate for reduced or 

35 aberrant expression or activity of the polypeptide. 
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Stimulation of activity is desirable in situations in 
which activity or expression is abnormally low 
downregulated 2Uid/or in which increased activity is 
likely to have a beneficial effect. Conversely^ 
5 inhibition of activity is desirable in situations in 
which activity or e3q>ression is abnormally high or 
upregulated and/or in which decreased activity is likely 
to have a beneficial effect. 

This invention is further illustrated by the following 
10 examples which should not be construed as limiting. The 
contents of all references, patents cUid published patent 
applications cited throughout this application are hereby 
incorporated by reference. 

EXAMPLES 

15 TANGO 180, TANGO 181, TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189 and TANGO 187, 
were identified in a human prostate epithelial cell 
library. TANGO 215 was identified in a human prostate 
stromal cell library. 

20 TANGO 180, TANGO 181, TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189, TANGO 215, 
and TANGO 187 were identified by first analyzing clones 
present in the two libraries to identify EST sequences 
which potentially encode a signal peptide having at least 

25 15 amino acids. Selected clones which include an EST 

sequence that appeared to encode a signal peptide having 
at least 15 amino acids were used to assemble additional 
EST sequences to form potential full-length gene 
sequences. The assembled full-length gene sequences were 

30 then used to identify actual full-length clones in the 
two libraries. 
Deposit of Clones 

Clones containing cDNA molecules encoding TANGO 180, 
TANGO 181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, 
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TANGO 186, TANGO 188, TANGO 189, TANGO 215 and TANGO 187 
were deposited with the American Type Culture Collection 
(Manassas, VA) as conposite deposits. 

Clones encoding TANGO 180, TANGO 181, TANGO 182 and 
5 TANGO 183, and T7VNGO 184 were deposited on September 25, 
1998 with the American Type Culture Collection \mder 
accession number ATCC 98901, from which each clone 
conqprising a particular cDNA clone is obtainable. This 
deposit is a mixture of five strains, each carrying one 

10 recombinant plasmld harboring a particular cDNA clone. 
To distinguish the strains and isolate a strain harboring 
a particular cDNA clone, one can first streak out an 
alic[uot of the mixttire to single colonies on nutrient 
medium (e.g., IiB plates) supplemented with 100/ig/ml 

15 ampicillin, grow single colonies, and then extract the 
plasmid DNA using a standard minipreparation procedure. 
Next, one can digest a sample of the DNA minipreparation 
with a combination of the restriction enzymes Sal I and 
Not I and resolve the resultant products on a 0.8% 

20 agarose gel using standard DNA electrophoresis 

conditions. The digest will liberate fragments as 
follows : 

TANGO 180 (EpTlSO) 1.2 kb and 2.7 kb 
TANGO 181 (EpTlBl) 4.5 kb and 2.7 kb 
25 TANGO 182 (EpT182) two 2.7 kb fragments 
TANGO 183 (EpT183) 1.6 kb and 2.7 kb 
TANGO 184 (EpT184) 4.5 kb 

The identity of the strains can be inferred from the 
fragments liberated. 

30 Clones encoding TANGO 185, TANGO 186, TANGO 187, TANGO 
188 and TANGO 189 (splice variant 1) were deposited on 
September 25, 1998 with the American Type Culture 
Collection under accession number ATCC 98900, from which 
each stain conprising a particular cDNA clone is 

35 obtainable. The deposit is a mixture of five strains. 
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each carrying one recombinant plasmid harboring a 
particular cDNA clone. To distinguish the strains and 
isolate a strain harboring a particular cDNA clone, one 
can first streak out an aliquot of the mixture to single 
5 colonies on nutrient medium (e.g., LB plates) 

supplemented with lOO/xg/ml ampicillin, grow single 
colonies, and then extract the plasmid DNA using a 
standard minipreparation procedure. Next, one can digest 
a sample of the DNA minipreparation with a combination of 

10 the restriction enzymes Sal I and Not I and resolve the 
resultant products on a 0.8% agarose gel using standard 
DNA electrophoresis conditions. The digest will liberate 
one vector fragment of 2.7 kb common to all strains, and 
one insert-specific fragment as follows: 

15 TANGO 185 (EpT185) 2,1 kb 

TANGO 186 (EpT186) 3.7 kb 

TANGO 187 (EpT187) 2.6 kb 

TANGO 188 (EpT188) 2.0 kb 

TANGO 189 (BpT189svl} 1.3 kb 

20 The identity of the strains can be inferred from the 
fragments liberated. 

A clone encoding TANGO 215 and four other clones were 
deposited on September 25, 1998 with the American Type 
Culture Collection under accession number ATCC 98899, 

25 from which the srrain comprising the TANGO 215 cDNA clone 
is obtainable. To distinguish the strains and isolate a 
strain harboring the TANGO 215 cDNA clone, one can first 
streak out an aliquot of the mixture to single colonies 
on nutrient medium (e.g., LB plates) supplemented with 

30 lOO/ig/ml ampicillin, grow single colonies, and then 

extract the plasmid DNA using a standard minipreparation 
procedure. Next, one can digest a sample of the DNA 
minipreparation with a combination of the restriction 
enzymes Sal I and Not I and resolve the resultant 
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products on a 0,8% agarose gel using standard DNA 
electrophoresis conditions. 

The digest will liberate one vector fragment of 2.7 kb 
common to all strains, and one insert -specific fragment 
5 as follows: 

TANGO 215 (EpT215) 2.8 kb 

The identity of the strain harboring the TANGO 215 cDNA 
clone can be inferred from the fragments liberated. 

Emiivalents 

10 The contents of all references, patents and published 
patent applications cited throughout this application are 
hereby incorporated by reference. Those skilled in the 
art will recognize, or be able to ascertain using no more 
than routine e3q)erimentation, many equivalents to the 

15 specific embodiments of the invention described herein. 
Such equivalents are intended to be encompassed by the 
following claims. 

What is claimed is: 
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1« An isolated nucleic acid molecule selected from 
the group consisting of: 

a) a nucleic acid molecule comprising a nucleotide 
sequence which is at least 55% identical to the 

5 nucleotide sequence of any of SEQ ID NOs:l-22, 34-43, and 

- , the cDNA insert of a plasmid deposited with 

the ATCC as any of Accession Nximbers 98899, 98900, and 
98901, or a complement thereof ; 

b) a nucleic acid molecule comprising a fragment of 
10 at least 300 nucleotides of the nucleotide sequence of 

any of SEQ ID NOs:l-22, 34-43, and - , the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; 

15 c) a nucleic acid molecule which encodes a 

polypeptide comprising the amino acid sequence of any of 

SEQ ID Nos: 23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 

20 98899, 98900, and 98901; 

d) a nucleic acid molecule which encodes a fragment 
of a polypeptide comprising the amino acid sequence of 

any of SBQ ID NOs:23-33, 54-63, and - wherein the 

fragment comprises at least 15 contiguous amino acids of 

25 any of SEQ ID NOs:23-33, 54-63, and - or the 

polypeptide encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

e) a nucleic acid molecule which encodes a naturally 
30 occurring allelic varicoit of a polypeptide comprising the 

amino acid sequence of any of SEQ ID NOs:23-33, 54-63, 

and - or an amino acid sequence encoded by the 

cDNA insert of a plasmid deposited with ATCC as any of 
Accession Numbers 98899, 98900, and 98901, wherein the 
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nucleic acid molecule hybridizes to a nucleic acid 
molecule comprising any of SEQ ID Nos:l-22, 34-43, and 

- or a complement thereof under stringent 

conditions . 

5 .2, The isolated nucleic acid molecule of claim 1, 
which is selected from the group consisting of: 

a) a nucleic acid molecule comprising the nucleotide 
sequence of any of SEQ ID NO: 1-22 and 34-43, the cDNA 
insezi: of a plasmid deposited with the ATCC as any of 

10 Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; and 

b) a nucleic acid molecule which encodes a 
polypeptide comprising the amino acid sequence of any of 
SEQ ID Nos:23-33, 54-63, and - or an amino acid 

15 sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901. 

3 , The nucleic acid molecule of claim 1 further 
comprising vector nucleic acid sequences. 

20 4. The nucleic acid molecule of claim 1 further 

comprising nucleic acid sequences encoding a heterologous 
polypeptide. 

5. A host cell which contains the nucleic acid 
molecule of claim 1. 

25 6. The host cell of claim 5 which is a mammalian host 
cell . 

7. A non-human mammalian host cell containing the 
nucleic acid molecule of 
claim 1. 
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8. An isolated polypeptide selected from the group 
consisting of: 

a) a fragment of a polypeptide comprising the amino 
acid sequence of any of SEQ ID Nos:23-33, 54-63, and - 

5 , wherein the fragment comprises at least 15 

contiguous amino acids of any of SEQ ID Nos: 23-33 and 54- 
63, and - ; 

b) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

10 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

15 nucleic acid molecule comprising any of SEQ ID Nos:l-22, 

34-43, and - or a complement thereof under 

stringent conditions; and 

c) a polypeptide which is encoded by a nucleic acid 
molecule conprising a nucleotide sequence which is at 

20 least 55% identical to a nucleic acid comprising the 

nucleotide sequence of any of SEQ ID NoS:l-22, 34-43, and 
- or a complement thereof. 

9. The isolated polypeptide of claim 8 comprising the 
amino acid sequence of any of SEQ ID Nos: 23-33, 54-63, 

25 and - or an amino acid sequence encoded by the 

cDNA inseirt of a plasmid deposited with the ATCC as any 
of Accession Numbers 98899, 98900, and 98901. 

10. The polypeptide of claim 8 further comprising 
heterologous amino acid sequences. 

30 11. An antibody which selectively binds to a 
polypeptide of claim 8. 
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12. A method for producing a polypeptide selected from 
the group consisting of: 

a) a polypeptide comprising the amino acid sequence 
of any of SEQ ID Nos:23'-33, 54-63, and - or an 

5 amino acid sequence encoded by the cDNA insert of a 
plasmid deposited with the ATCC as any of Accession 
Numbers 98899, 98900, and 98901; 

b) a polypeptide conqprising a fragment of the amino 
acid sequence of any of SEQ ID Nos:23-33, 54-63, and 

10 - or an amino acid sequence encoded by the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Nunibers 98899, 98900, and 98901, wherein the 
fragment cornprises at least 15 contiguous amino acids of 
any of SEQ ID Nos:23-33, 54-63, and - or an amino 

15 acid sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

c) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

20 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

25 nucleic acid molecule comprising the nucleotide sequence 

of any of SEQ ID Nos:l-22, 54-63, and - or a 

complement thereof under stringent conditions; 

comprising culturing the host cell of claim 5 iinder 
conditions in which the nucleic acid molecule is 

30 expressed. 

13. A method for detecting the presence of a 
polypeptide of claim 8 in a sample, cotnprising: 
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a) contacting the sainple with a compound which 
selectively binds to a polypeptide of claim 8; and 

b) determining whether the compound binds to the 
polypeptide in the sample. 

5 14. The method of claim 13, wherein the con5)ound which 
binds to the polypeptide is an antibody. 

15. A kit comprising a corapovmd which selectively 
binds to a polypeptide of claim 8 and instructions for 
use. 

10 16. A method for detecting the presence of a nucleic 
acid molecule of claim 1 in a sample, comprising the 
steps of: 

a) contacting the sample with a nucleic acid probe or 
primer which selectively hybridizes to the nucleic acid 

15 molecule; and 

b) determining whether the nucleic acid probe or 
primer binds to a nucleic acid molecule in the sample. 

17. The method of claim 16, wherein the sample 
conprises MRNA molecules and is contacted with a nucleic 

20 acid probe. 

18. A kit comprising a compound which selectively 
hybridizes to a nucleic acid molecule of claim 1 and 
instructions for use. 

19. A method for identifying a coitqpound which binds to 
25 a polypeptide of claim 8 comprising the steps of: 

a) contacting a polypeptide, or a cell expressing a 
polypeptide of claim 8 with a test compound; and 

b) determining whether the polypeptide binds to the 
test compound. 
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20. The method of claim 19, wherein the binding of the 
test compound to the polypeptide is detected by a method 
selected from the group consisting of: 

a) detection of binding by direct detecting of the 

5 binding of the test compound to the polypeptide binding; 
and 

b) detection of binding using a competition binding 
assay. 

21. A method for modulating the activity of a 
10 polypeptide of claim 8 conprising contacting a 

polypeptide or a cell e3«pressing a polypeptide of claim B 
with a compound which binds to the polypeptide in a 
sufficient concentration to modulate the activity of the 
polypeptide. 

15 22. A method for identifying a compound which 
modulates the activity of a polypeptide of claim 8, 
comprising: 

a) contacting a polypeptide of claim 8 with a test 
compound; and 

20 b) determining the effect of the test compound on the 
activity of the polypeptide to thereby identify a 
conpotmd which modulates the activity of the polypeptide. 
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CTCGACCCACGCGTCCGanXaiATATCGAGCTCGCT GC T ^ 79 

MALL 4 

GGACTCTGTGGGGACGmCCCCGCGCCCCCGCTCCGGGACCCGTACMCCCG^ ATG GCC CTO CTC 1S4 

SRPALTLLLLLMAAVVRCQE 24 
TCG CCC CCC GCG CTC ACC CTC CTC CTC CTC CTC ATG GCC GCT GTT GTC AGG TCC CAG GAG 214 

QAQTTOWRATLKTIRNGVHK 44 
CAG GCC CAG ACC ACC GAC TGG AGA GCC ACC CTG AAG ACC ATC CGG AAC GGC GTT CAT AAG 274 

lOTYLNAALDLLGOE OGLCO 64 
ATA GAC ACG TAC CTG AAC CCC CCC TTG GAC CTC CTC GGA GGC GAG GAC GGT CTC TGC CAG 334 

VKCSOGSKPFPRYGYKPSPP 84 
TAT AAA TCC AGT GAC GGA TCT AAG CCT TTC CCA CGT TAT GGT TAT AAA CCC TCC CCA CCC 394 

HGCOSPLFGVHLNIGIPSLT104 
AATGGATGTGGCTCTCCACTCTTTCGTGTTCATCTTAAC ATT GGT ATC CCT TCC CTG ACA 454 

fCCCN0KORCirETCGKSKNOC124 
AAG TCT TGC AAC CAA CAC GAC AGG TGC TAT GAG ACC TGT GGC AAA AGC AAG AAT GAC TGT 514 

DBeP0YCLSKZCR0VQKTLG144 
GAT GAA GAA TTC CAG TAT TGC CTC TCC AAG ATC TCC CGA GAT GTA CAG AAA ACA CTA GGA 574 

LTQHVOACETTVELLFDSVr 164 
CTA ACT CAG CAT GTT CAG GCA TCT GAA ACA ACA GTO GAG CTC TTO TTT CAC AGT GTT ATA 634 

HLGCKPirL0S^QRAACRCKYElB4 
CAT TTA GGT TGT AAA CCA TAT CTC GAC AGC CAA CGA GCC CCA TGC AGC TGT CAT TAT GAA 694 

E K T 0 L • 190 
GAA AAA ACT GAT CTT TAA 712 

AGGAGATCCCGACAGCTAGTCACAGATGAAGATGGAAGAACATACCTTTCACAAATAACTAATCTT^ 791 

A C TC TCrrAT TTTT G T GA AACGATT Al I i ICiA CACCTTAAAATAATTTAT ATC TT CA TCTTAAAACCTCAAACCAAAAA 670 

AACTGAGCCACATACTGAGGGCACGGCACC C ' l ' IGlCl ' TC l XrA GGT Al 1 1 lUUtU UXI A T IGCi ' LCC TT A CTTAGTATGC 949 

CAA ArG T C T l t ; ACCAATATCAAAAACAA Gnx;crrG ' rrrA CCGCAGA A ITTT G AAAACAGGAAT^^^ 1028 

ACAACCACATTTACCAAAAAAACAGATCAAATATAAAATTCATCATA ArC T ClG 1 rCA ACATTATCTTATTTCCAAAAT 1107 

GGCGAAATTATCACrrACAACTA f UUI 1 lA CTATCAAATTTTAAATACACATTTATCCCTACAAAAAAAAAAAAAAAA 1 186 



AAAAAAAGGCCCCCCCC 



120J 
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CTCGACCCACCCGTCCCGGCCGGGGTCCTGACCCGGACCaX^^ 79 

MVTPRPAPARGPALLLLL 18 
GCAG ATG GTO ACT CCG CGG CCC GCG CCC GCC COG GGC CCC GCG CTC CTC CTC CTC CTG 137 

LLATARGQEODQTTDWRATL 38 
CTG CTG GCC ACT GCG CGC GGG CAG GAA CAG GAC CAG ACC ACC GAC TGG AGG GCC ACC CTC 197 

KTIRNGIHKIDTYLNAALDL SS 
AAG ACC ATC CCC AAC GGC ATC CAC AAG ATA GAC ACG TAC CTC AAC GCC GCG CTG GAC CTG 257 

LGGEOGLCQYKCSOGSKPVP 78 
CTG GGC GGC GAG GAC GGG CTC TGC CAG TAC AAG TGC AGC GAC GGA TOG AAG CCT GTT CCA 317 

RYCYKPSPPMGCGSPLFGVH 98 
COC TAT GGA TAT AAA CCA TCT CCA CCA AAT GGC TGT GGC TCT CCA CTG TTT CGC GTT CAT 377 

LNtGtPSLTKCCMQHDRCYEUS 
CTC AAC ATA GCT ATC CCT TCC CTG ACC AAG TGC TGC AAC CAG CAC GAC AOA TGC TAT GAG 437 

TCGKSKMDCDEEFQYCLSKri38 
ACC TGC GGG AAA AGC AAG AAC GAC TCT GAC GAG CAC TTC CAG TAC TGC CTC TCC AAG ATC 497 

CRDVQKTLGLSONVQACETT158 
TCC AGA GAC GTC CAG AAG ACC CTC GGA CTA TCT CAC AAC CTC CAG GCA TGT GAG ACA ACG 557 

VBLI.FOSVIHLGCKPYLDS0 178 
GTG GAG CTC CTC TTT GAC AGC GTC ATC CAT TTA CGC TGC AAG CCA TAC CTG GAC AGC CAG 017 

RAACMCRYEEKTDL* 193 
COG GCT CCA TCC TCC TCT CCT TAT GAA GAA AAA ACA GAT CTA TAA $$2 

ACACCCTCACTCCTCCAGACCACCCCACAATGGACGATCATCCTTCCCAAACATCGGATCCT^ 74 1 

CCTTA G TrTT C T C TC CA TCCGTC Al ri T C AGAC Ll 1 ILlA TA ClUmitlTrill lA GAACCTCAAAGTCAAAACCGTC 820 

GGCGCCCACCCAGAAACACACCGAGACCATCCTTGOCATOCGCACCCACCAGCACATCCAAGACCATCCC'^ 899 

CTCGCTCTCTTGCTGCCTCCCCCAAACTCGCAAGAAAACCrTAAGCTOCTGTGACTTGGTCI^ 978 

AATAAAAATCAAACCAAATCTAAAATTCATTGTAAGGA Cl IITO IGCATTATTTT A TTT T iC 1057 

CCTTACAACTATTATTTATTTTGAAATTTCAGATGTACATTTATACCrOCAAAAACTATTAAT 113$ 

ACATAATCTCrrCTTTCTCrCAACCCCACTAACATAOCTATAAATATCTrACrCAAAAC^ X21S 

A I ' L T C : lX;f A CA GI f COA ATCACGGTTGGTA L ' I iC I' C IGC ACACACCCCCCACCACATCTCA G 7 G TT CG GATCTCCACA 1294 

oAATTCACAACCCCAGCITCCTCTCTCACAAACCGCTTAGACTCAATGTCCTTCCTCrCCTGC^ 1373 

*JACCCCTrTAACGCCCCAACCCCACCTCTGAATO\CTCCCCTATCrGCTCCTCACCT^ 1452 

i r; iw CATCTTCTATCCTGGAGTA Gli;: I A AAAGTCTCAC Arrrit rA ATCCACCTCTTAATAAAAGCrATT^ 1531 

TaCTAAAAAAAAAAAAAAAAAAAAAAAAAAGGCCGGCCC 1570 
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ACCUXairTCCCCCaOGCGTCOCX^TOGCGTGC^^ 79 

MAQLGAVVAV 10 
AGCGCCTGCAGGGACAGCCTGGATAAAGGCTCACTG ATG OCT CAC TTG GGA GCA GTT GTQ GCT GTG 145 

AS5FFCASLFSAVKKIEEGH 30 
GCr TCC ACT TTC TTT TGT GCA TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT 205 

IGVyYRGGAXiLTSTSGPOPH .SO 
ATT GGG GTA TAT TAC AGA GGC GGT GCC CTC CTG ACT TCG ACC AGC GGC CCT GGT TTC CAT 265 

LMLPFITSYKSVQTTLQTDE 70 
CTC ATG CTC CCT TTC ATC ACA TCA TAT AAG TCT GTG CAG ACC ACA CTC CAG ACA GAT GAG 325 

VKNVPCGTSGGVMIYFDRIS 90 
GTG AAG AAT GTA CCT TGT GGG ACT ACT GGT GGT GTG ATO ATC TAC TTT GAC AGA ATT GAA 385 

VVMFLVPMAVYDIVKMYTAO 110 
GTQ GTG AAC TTC CTG GTC CCG AAC GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCT GAC 445 

YDKALlFNKZKHSLirOFCSV 130 
TAT GAC AAG GCC CTC ATC TTC AAC AAG ATC CAC CAC GAA CTG AAC CAG TTC TGC AGT GTG 505 

HTLOSVYZSLFOQtOfiirLKL ISO 
CAC ACC err CAA gag GTC TAC ATT gag CTC TTT GAT CAG ATT GAT GAA AAT CTC AAA CTG SS5 

ALQODLTSMAPGLVIOAVRV 170 
CCT TTG CAA CAG GAC CTG ACC TCC ATG GCC CCT GGG CTG GTC ATT CAA GCT GTG CGG GTA 625 

TKPNiPeAIRRNYELMBSBX 190 
ACA AAG CCC AAC ATA CCA GAG GCA ATC CGC AGA AAC TAC GAG TTG ATO GAA AGT GAG AAC 605 

TKLLZAAOKQKVVBKBAETe 210 
ACA AAG CTT CTC ATT CCC GCC CAG AAA CAG AAG GTG GTG GAA AAC GAA CCA GAC ACA GAG 745 

RKKALIBAEKVAOVAEITYC 230 
CCC AAC AAC GCC CTC ATT CAC CCA CAA AAA CTC CCC CAC CTC CCT CAC ATC ACC TAC CCG 805 

QKVNEKETBKKISEIEDAAF 250 
CAC AAC CTC ATG GAG AAG CAC ACT CAC AAC AAC ATT TCA CAA ATT CAA CAT CCT CCA TTT 865 

LAREKAKAOAECYTAHKZAE 270 
CTC CCC CGG GAG AAG GCA AAC CCA CAT CCT GAC TGC TAC ACT CCT ATC AAA ATA GCC GAA 925 

A>IKLKLTPeYLQLMKYKArA 290 
CCC AAT AAG CTC AAC CTA ACC CCT CAA TAT CTC CAC CTC ATC AAG TAC AAC CCC ATT CCT 985 

S.VSKIYFCKOIPNMFMDSAC 310 
TCC AAC ACC AAC ATT TAC TTT CCC AAA CAC ATT CCT AAC ATC TTC ATC GAC TCT CCC CCC 1045 

SVSKOFEGLADKLSFCLEDE 330 
ACT CTC AGC AAC CAC TTT GAC CCC CTA CCT CAC AAC CTA ACC TTT GCC TTA CAA GAT CAA 1105 



PLSTATKEN* 
CCC TTC CAo ACG CCC ACT PJ\G CAC AAT TCA 



340 
1135 
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AAAAAACXrCJlTATGACTGaWlTGATACTTAAGCaGATC^ rriTl ' A AGATGftATCRGAATGTTC C lXICCTCCC C 1214 

GJtfrtAC t TrCT C TG A CT G TC T T CCA GTTACTGTG G T G AAA 1293 

AGGJUXSGTGGGGACTGATC»TGGGCCKrrm 13 72 

GGGCTTGACCTTTGACCTCTAGACACTAATTTTATCCl^^ 14 S 1 

GAGAAATGTAGAGTGTTACCrCCAACTCAriTGAm 1530 

TCCAAGGTAGGAGATCTCTGTGGGTCAGGCTCAGCAACTGAG^ 1$09 

AGAAACAGCrGCAGAGAACATTTGACCTTCCTGGCATTC^^ 1688 

TTXWCCCCTCRTAAGGAAGTACTGCTGCTAGCnrr^^ 17€7 

G TT y yr GUlTfC T G ACTACATTTCTAaAGTCAGAGCTTtA^ 1846 

TGTG A ' mUi ' A i 1 11 U 1 1 ULi iLiU KMAATCC l ^ n ' T C A Tl V^ ^ 1 925 

CTCftAGTCTCTTAACAGCTGCTGGACTGGG ArCCT 2004 

TGCACCTCGAGATGAAGTCTCrTTCTAn^ 2083 
GGAACGATCAGTCAAGAa A l' G AXJLlWi' C l rAATC(XTGT^ lOlbLlliGOtftfnGGgrCTGACTTAOTGATA^^ 2162 

ACTCTATTCACTAAGTACCrTGTGTTTTTAAAT^^ 2241 

CAGAGACAGClXiltSTGCAGCAAATCMA<nt^TCCCaW^ 2320 

GOGX k IILVIT r rCATTACTAGGTCAGAACATTTTaAaTnCCTTGGOAGATTQ 2399 

AAAG U ' lTrriCi lA TATCCTGAGATTGACGCCT'ACCGGOTOTCCAACC^ 2478 

GTCaACCGCACCTATCTCCACrrTAACTTCCAACCATATT^^ 2SS7 

CTTATCACCTCACCCCCACCCCCCACCCCCCACCCCCCCCCCCGCO^ 2636 

TCAATG C riC C 1 i ICT G CCAtXJUVTCCCTGC CTl X Z T l T nCU XICCATCCCCACA Cr f C l^ ^ 271S 

ACLi'lC; UA;C AC CT l' G Cil CA GAA Ori l iCC CACCATTGACCCT CC Cl'ACAAACATACACTCTTAG C r CC^^ ^ 2794 
AACTTCC Ca iLi UlGr T CW CACTCCTGCWAL-nCl UG LL ^ 1 lU OUXXAACAAA A TCTG C rrC C OA 2873 

(m;Ara;ATTTTAAT G TC C TC CA GAGTC C TTT CA GAAC C^ 2952 

AATCCCAGCACrrrOCXSAGCXXAAGCCACCCC^ 3031 

ACCCCATCTCTACCAAAAATACAAATATTACCOXXICATOCrr^^ 3110 

GCCACGACAATTCCTTCAACTCCOCACCCACAGCTT^^ 3189 

ACACCAACCCTCCCTCTCAACAAAAGAAGCTCATTTCCCAACACTAGCATAGCCACT 3268 

TTCCTCCCATTTCCCnT^CTATTAATCACTTCTTACAGC^ 3347 

C ItL : X T A CTCrCACX Z T C r' iG i rG CCTACACCTCACtfUAACACCAAT^ 3426 

TCAJCTTCTCT^AATACCACACTrTCCTCAGCT^ 3505 
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CCCATGAGaUMAAGgiaXTCAGTAGAGTC^^ 3S84 
AAACAAAAAATATGTTATCCrAau:ATTA{nt^ 2662 

ATCAGCTtSTTTTATTTGCATAGGCAACTAACCTGTCTGT^ 3742 

TCTTAAAACATTTGAATTCTAAACATGTAAAATGTGACACCC^ 3821 

ATAAACAGTTACTTATTTTGATACL\TCnTCCA 3900 

TTCCAJWX5AAAAATCACCTTGGTTGAATGTTTCT 3979 

TAATCACnrmAAAATATAAGGACCGAATGCAIUSG^ 4058 

AGAT(rrGGAaGaATCTGTGAT(»TATAAAAAGOGA^^ 4137 

AGCTGTTrTATAAATGATCATTCACTGTTCCTATUjTlXTrATOT^ 4216 

GTAAATACntSAAAGTAAOATOGTCATACTTAC^^ 429S 

GTXXXrTACT G TC TO T G TC A ATGTAACCAGTAC T T C T^ ^ 4374 

rraT AOCCA CXt y mTrf TTT TCUTG T T T CC TT A TAA^ 4451 
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GTCGACCCACGCGTCC:GCGGAaK:GTGGCCGCGGAC^ 79 



SLFSAVHKIBEGH I GVYYRG 37 
TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT ATT GGA GTA TAT TAC AGA GGT 277 

CALLTSTSCPGFHLMLPFIT .57 
GGT GCC CTG CTG ACC TCC ACC ACT CGC CCG GGT TTC CAT CTC ATG CTC CCG TTC ATC ACA 337 

SYKSVQTTLQTDEVKNVPCG 77 
TCC TAT AAG TCT GTA CAG ACC ACT CTC CXA ACT GAT GAA GTG AAG AAC GTA CCA TCT GGA 397 

TSGGVMIYFDRIE VVNFLVP 97 
ACC ACT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA GTG GTG AAC TTC CTG GTC CCA 457 

NAVYDI VKNYTADYDKALIF 117 
AAT GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCA GAC TAT GAC AAG GCC CTC ATC TTC 517 

NICIHHBLN0FCSVHTL0BVY137 
AAC AAG ATC CAT CAT GAG CTT AAC CAG TTC TGC ACC GTT CAT ACT CTT CAG GAA GTC TAT 577 

tSLFDOIDCMLKLALOOOLT 157 
ATC GAG CTC TTT GAT CAA ATT GAT GAA AAC CTC AAG TTC GCT TTC CAG CAG GAC CTG ACT 637 

SMAPGLVI0AVRVTKPMIPEI77 
TCC ATG GCC CCT OGG CTG GTT ATC CAA GCT GTC CGA GTC ACA AAC CCC AAT ATA CCT GAG 697 

AZRRNYELME$EKTKLLZAA197 
GCA ATC CGC AGO AAC TAT CAG CTC ATG GAA ACC GAG AAG ACC AAG CTT CTC ATT CCA CCC 757 

0K0KVVEKEAETERKKALIE217 
CAG AAG CAG AAG CTG GTC GAA AAG GAC GCA GAA ACA GAG AGG AAG AAC CCC CTC ATT CAG 817 

AEKVAQVABITYGOKVMEKE237 
CCA CAA AAA GTC GCA CAG GTT GCA CAA ATC ACC TAT GCC CAA AAC GTC ATC CAG AAG GAG 877 




rACGCTTGCACGCG CG TrC GGC TGTCT A CGGAGCCCCTGaAGGGACAGCCTGGATACftO 158 



MAQLGAVVAVASS FFCA 
GTTCACTG ATG GCT CAG TTG GGA GCT GTT CTC GCC GTG CCT TCC ACT TTC TTT TCT GCA 



17 
217 



T E K 
ACA GAG AAG 
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MHMTQARV 8 

GTCGACCCACGCGTCCGGCGGCTGGGCri'ClTCTCAGACGAACGAGA ATG AAT ATG ACT CAA GCC CGC GTT 71 

LVAAVVGLVAVLLYASIHKI 28 
CTG CTG GCT GCA CrrC GTt3 GGG TTG GTG GCT GTC CTC CTC TAC GCC TCC ATC CAC AAC ATT 131 

EEGHLAVYYRCGALLTSPSG 48 
GAG GAG GGC CAT CTG GCT CTG TAC TAC AGG GGA GGA GCT TTA CTA ACT AGC CCC ACT GGA 191 

PGYHIMLPPITTFRSVQTTL 68 
CCA GGC TAT CAT ATC ATG TTG CCT TTC ATT ACT ACG TTC AGA TCP GTG GAG ACA ACA CTA dSX 

QTDEVKWVPCGTSGGVMIYI 8B 
CAA ACT GAT GAA GTT AAA AAT GTC CCT TGT GGA ACA AGT GGT GGG GTC ATG ATC TAT ATT 311 

ORIEVVMMLAPYAVFDIVRN 108 
CAC CGA ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAT ATC GTG AGO AAC 371 

YTAOYDKTLIPNKIHHELNQ 128 
TAT ACT GCA GAT TAT GAC AAG ACC TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG 431 

FCSAHTLQEVYIELFDQIOB 148 
TTC TGC AGT GCC CAC ACA CTT CAG CAA GTT TAC ATT GAA TTG TTT GAT CAA ATA GAT GAA 491 

NLKQALQKDLKLMAPGLTIQ X$B 
AAC CTG AAG CAA GCT CTG CAG AAA GAC TTA AAC CTC ATG GCC CCA GGT CTC ACT ATA CAO S5X 

AVRVTKPKtPEAlRRNFELM 188 
GCT GTG CGT CTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAC TTA ATG 611 

EAEK TKLLIAAQKQKVVEKE 208 
GAG GCT GAG AAC ACA AAA CTC CTT ATA GCT GCA CAG AAA CAA AAG GTT CTG GAA AAA GAA 671 

AETERKKAVIEAEKXAQVAK 228 
GCT GAG ACA GAG AGG AAA AAG GCA GTT ATA GAA GCA GAC AAG ATT GCA CAA GTC CCA AAA 731 

IRPOOKVUEKBTEKRISEtE 248 
ATT CCG TTT CAG CAG AAA GTG ATG GAA AAA GAA ACT GAA AAG CCC ATT TCT GAA ATC CAA 791 

DAAFLARBKAKADABYYAAH 268 
GAT CCT CCA TTC CTG GCC CCA GAC AAA CCC AAA GCA GAT CCT CAA TAT TAT CCT CCA CAC 851 

KYATSNKHKLTPEYLECKKY 288 
AAA TAT GCC ACC TCA AAC AAG CAC AAG TTG ACC CCG GAA TAT CTG GAC CTC AAA AAG TAC 911 

OACASNSK£YFGSNCPNMFV 308 
CAC CCC ATT CCT TCT AAC AGT AAC ATC TAT TTT GOC AGC AAC ATC CCT AAC ATG TTC CTC 971 

DSSCALKYSDERTGRESSLP 328 
CAC TCC TCA TCT CCT TTG AAA TAT TCA GAT ATT ACC ACT CCA AGA CAA ACC TCA CTC CCC 10 J 1 

SKEALEPSGEMVIONKSSTG 348 
TCT AAC CAG GCT CTT CAA CCC TCT OCA GAC AAC GTC ATC CAA AAC AAA CAC AGC ACA CGT 1091 



TGA 



349 
1094 
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TGCAAOACXriGGAAATCTTCTCCATATCAACATm^ 1173 



CeaCCTGTCTOACACACaAATG G T C TTTT CA GC 1331 

CTCATCWITGAGGAAAGTCTGATGCTAAGATACTCXCTGC^ 1410 

GCMGCCATGCTTGACTAAGGTACCTGGTTTTAGCC^ 1489 

TGGGACAGGGTTTTAACCACAAATAGGAGCAGaiTGCAATTC^ 1568 

CGAAiCrrTTTTATTTTTAAAACTGGATCTGGGGTAT^ 1^47 

GCTGCCATGGTGACAAGCACACTGATGCTCCTTAAtUTTGTT^ 172G 

TAGAAAGCaTCCTTCXSTCATCa i ltilClL L lXLCC^ 1805 

CACCTCCCCCACX»GATCAGGATTOCACTGACGTCCTOGGC^^ 1884 

TAACCTCTGGCATTACySAGACCrACriXATKflXXS ^ ; I Il ' l ' TC CTTCftGTTTSUl Cll ' riVl ' U aGCACCTGTG ^ ^ 1963 

GTACvTT0GGCCTGAGTTTGTGCACCTT<7IT^ 2042 

GACCACTTCTAGAAATCTTTCACCTGTCAGGCCTGTCAGTCTC^ 2121 

GGAAAGGAAAGCCCAGATTTGAATGGGTCTTTCCCCTOGGC^^ 2200 
ll ' TTTtATri ' l ' lliLTCA TTTAATTCTATAA A TTCTCTT TA TJ^ A I lUiUi ' i ' Crn ' AO TT C TCCT' I ' A AAAGAAC 2279 

TTTtCAATTATAAAAATAAAATCrTTACCTGTCQU lilUl ' A^ ^ 2358 



GCGCTCACGftACAGGCAAATtXrCOCTGTGAAGTCTTAAAGCACTT 2516 
CTCTGCCCTCTCAGCTCTGAGCCTCCCCCTC^^ 2S9S 
ATTTATATGTTOAAATtXrrACClTTTTTAAAATAACAAACT 2674 



GATTT A CAGAGAACTTACACYItAlxntrrrCCACCiCTCCn^^ 




llTGGAGGATASAG 1252 



LTTCCtAGAGATGTTrTATAGTTACATGAGCAAAAGCTGTT^ 2437 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 



2704 
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GTCGACCCAOGCGTCCGTAAAAA' 




H X Y I O R I 
TCGAGTC ATG ATC TAT ATT GAC CGA ATA 



7 
72 



EVVNMLAPYAVPOrVRH^TA 27 
OAA GIG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAC ATT GTG AGG AAC TAT ACT CCA 132 

DYDKTLIFNKIHHELNQFCS 47 
OAC TAC GAC AAO ACT TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG TTT TGC AGT 192 

AHTLQEVYIELFOQIOEHLK '67 
GCC CaC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA AAC CTG AAG 252 

OALOKDLNTMAPGLTIQAVR 87 
CAG GCC CTG CAA AAA GAT TTA AAC ACC ATG GCC CCA CGT CTC ACT ATC CAG GCT GTG CGT 312 

VTKPKI PBAIRRMFELMBAE 107 
GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AOA AAT TTT GAA TTA ATG GAG 6CA GAG 372 

KTKLI.rAAQK0KVVBKEAET127 
AAG ACA AAA CTT CTC ATA GCT GCA CAG AAA CAA AAG GTG GTG GAG AAA GAA GCT GAG ACG 432 

ERKRAVIEABKtA0VAKlRF147 
GAG AGG AAA AGG GCT GTT ATA GAA GCA GAG AAG ATT GCA CAA GTA GCA AAA ATT CGA TTT 492 

0QKVMBKBTEKRISEIEDAA167 
CAA CAG AAA GTG ATG GAG AAA GAA ACT GAA AAA CGC ATT TCT GAG ATT GAA GAT GCT GCG 552 

FLAREKAKADAEYYAAHKYA187 
TTC CTG GCC CCA GAC AAG GCA AAA GCA GAT GCC GAG TAT TAC GCT GCA CAC AAA TAC GCC 612 

TSNKHXLTPEYLBLKKYQAI 207 
ACC TCA AAC AAG CAC AAA CTG ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC CAG GCC ATT £72 

ASIfSKXYFGSNIPSNFV0SS227 
CCC TCA AAC ACT AAC ATC TAC TTT OGC ACC AAC ATC CCC ACC ATC TTT GTG GAC TCC TCC 732 

CALKYSDGRTGREDSLPPEE 247 
TCT CCT CTC AAA TAC TCT GAT CCT ACG ACT GCC AGA CAA GAC TCC CTT CCC CCA GAC GAC 792 

AREPSCBSP. XQNKSNAG* 205 
GCC CGT GAC CCC TCT CGA GAG ACC CCC ATC CAA AAC AAC GAG AAC CCA GGT TCA 846 

TGCAACAGCTCCAAATGTTCTCCCATATCAAGATGCCACCCAACGCCCTAACTOGGAACAGTGG^ 925 

ACATTCACACAGA A TGT G T C CT Cron ' l ' On^A TTCTCrmiU^TA C T CCa^ 1004 

CCTCTCTGCCACrCAAACCGTCTCTGCACCa\CA (jri UA TCAACTATCCrCTA lG l C T 1083 

ATGAATCAGCGAAACTCroATCCTAACATACTCCCTCCACTCCAATGTCAAACA 1162 

AACCTArrGAATAATGTTTACATTCCTCCCTCACCAC A T C T Gl 'GC'IX J ICACATTCAACACCr 1241 

ACCrrCACAAAACCCTAAGTTAAAGAACACAACTCTCATCACACACTrCGCACC C G aX 1320 

GCCATTCCTCCATCTGATTGACACCCACACCTCTCCC rTCCCAOCAAATTATCrrCCAGTTCAATCACCATTTACTTCA 1399 
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TACAAATTGTAC Crn ' L iUH 1 1 1 C TAGTaU3 G TTtXrn » CCTGCWXXWa3CGTACT^^ 1478 

craSAAGATRTTCCCAATCACTACrmArrC CG TT A CG 1557 

AAAGCCTCCACTGCACCAAAGCTACGGCrrCCCTGTGTTTCCT 163 S 

ATGTGTGACTAAAGTGCCCCGTTTTAGCCACAGACAACTGCTTAGAT X71S 

GCTTTAACCAGACATAGGAGCACTGTGCAATTCCTGATTCACro 1794 

TC TTTTT A AAACTGGATTTGGOTCACATTCATTCACCCCAACACTTCT 1873 

GTCACTAAOICACTGATTCTCCTTAAAGTAATTCTCGAAGTGTGGAAC^^ 1952 

TTGTCTCCTTCCCTGGGATGCAGATACCGAACT^^ 2031 

GTGALl iL C i UGGO tfjCCATTGAATTCATTTTCCATQAGAAGATGACAgA 2110 

ATCCAGAC C TTTTT G CCaVTCACATTAA CrT T CC TC G AATA llXri^^ ^ 2189 

TGACAG LTCTrC ' roTA TA ClXriXj TT G AAGCCAC^ L'l IjAAACCTCTCAgCTGri'UATC 2268 

TCACaGCAGCTAAAGGCTTOTGCCAAACATTTrATTAAGW 2347 
TTATAGTATACAGGC A ir m T A ATATGGACAAAATAATTTTTCTC^ AATTATAGAAATTACCTTCAAACAGATTTT 2426 

::CCCTrCliPJiT]KCTtsaT^ 2505 

IVTTCCTAGAGAl CT rr C TCATTCCCATTTA^ 2584 

GGATTTCrrACCXCTCATAOCK^XGCrrGACGW 2663 

ACCTCCTTATXXJACTGAGCTTCCCrCTGCCCACTCAC 2742 

AATATACACTCTAATCTTTAAGTCTAAATTTATATC^ 2821 

TATCAAAAAAAAAAAAAAAAACGCCXX3CCG 2851 
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CraaCCCftCCCCnXXGGCGGGGACRA L T UXfr^ ^ ^ ^ 79 

MKtLSIiVAVVGCLI.V IS 
AGCAAGCCTGATAAGC ATO AAG CTC TTA TCT TTG GTG GCT GTG GTC GGG TGT TTG CTG GTG 140 

PPAEANKSSEDIRCKCICPP 35 
CCC CCA CXrX GAA <XX AAC AAG ACT TCT C»A GAT ATC CGG TCC AAA TO 200 

yRKlSGHIYNQMVSQKDCNC 5S 
TAT AGA AAC ATC ACT GGG CAC ATT TAC AAC CAG AAT GTA TCC CAG AAG GAC TGC AAC TGC 260 

LRVVEPMPVPGHDVEAYCLL 75 
CTG CAC GTG GTG GAG CCC ATG CCA GTG CCT GGC CAT GAC GTG GAG GCC TAC TGC CTG CTG 320 

CBCRYBBRSTTTIKVIlViy 95 
TGC GAG TGC AGG TAC GAG GAG CGC AGC ACC ACC ACC ATC AAG GTC ATC ATT GTC ATC TAC 360 

LSVV0ALLLYMAPLMLVDPL115 
CTC TCC GTG GTG GGT GCC CTG TTG CTC TAC ATG CCC TTC CTG ATG CTG GTG GAC CCT CTG 440 

XRKP0AYTEQLHNEEENEDA13S 
ATC CGA AAG CCG GAT GCA TAC ACT GAG CAA CTG CAC AAT GAG GAG GAG AAT GAG GAT GCT 500 

RSMAAAAASLGGPRAHTVLBISS 
CGC TCT ATG GCAGCAGCTGCTGCATCCCTCGGGGGACCCCCA GCA AAC ACA GTC CTG GAG 560 

RVEGA0QRHKLQV0BQRKTV175 
CGT GTG GAA GGT GCC CAC CAG COG TGO AAG CTG CAG GTG CAG GAG CAG GGG AAG ACA GTC 620 

FDRHKMLS* 184 
TTC CAT CCG CAC AAG ATG CTC ACC TAG 647 

AT GGlR j'K X iT G i' UG TT GG GTCAAGGCCCCAACACCATCGCTGCCAOCrrCCAGGCT^ 726 

CCCTTCC C T aXi l l 'CCACT C T TCCCmA AAAGCCTGTGCC Aini '' iLLi^ ^^^ 805 

TTCGCTATTTTGATTA00G A AGAG0GA ltS 1tX; TC T C TC A TC TCCG TT G ' lLilUl i ' GGGlLl i iGG CGTTGAAGGCAGOG 884 

CGAACCCAGGCCAGAAGGCAATCGACACATTCCAGGCCGCCTCACGACTCGATCCGATCTGTCTa^^ 963 

TrCCCGCCTTCCACCTCTCA t; ' ! Li lUiC AA' fG TT G TT A CCCTTGGAACATAAAGCTGCCTCTTCAGCAACTCACTCTCT 1042 

CGCAGGAAACCATCCCCCACCATTCAGCATCTCTTCCTrTCTCCACTCCTT^^ 1121 

GCCCCTCAGCCCCAGCCCCACCTCCAGCCCTCAGGACACCTCTGATCGGAGAGCrGGGCC 1200 

TCAOGCTCCACTCCAAG C rG G T C TT CC CT G TCCCCrGTGCAClT C r a; CACTOCCGCATGGAGTG^^ 1279 

flCTrCAGGC O TCT GCG CACTCCCT CC T C rCCCCAGTGTCCACAGTCACTGACCCAGAC 1358 

^VCVTCACACTCCACCCTCACCCTCCATCTGAACACCACACCCCCTCTACTrGCGTTG^ 1437 

TCAA LTraSr r G T A CCAGTCCATCCACACAAA AlT TT G T C C lVr i V t r i'T A CA G -^ 1516 

ATTAAATTGTrrTATTTCTCAAAAAAAAAAAAAAAAAAAAGGCCCGCCG 1565 
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GTCGaCCCACGCGTCCGXSCCTGCTGATCACTGGaX^CT 



79 



MKLLCLVAVVGCLLVPP 17 
GCCCGATAAGC ATG AAG CTC CTG TCT TTG GTG GCT GTG GTG GGG TGC TTG CTG GTG CCC CCA 141 

AQANKSSEDIRCKCICPPYR 37 
GCT CAA GCC AAC AAG AGC TCT GAA GAT ATC CGG TGC AAA TGC ATC TGT CCG CCT TAC AGA 201 

NISGHZytlQNVSQKDCNCLH 57 
AAC ATC AGC GGG CAC ATT TAC AAC CAG AAT GTG TCT CAG AAG GAC TGC AAC TGC CTG CAT 261 

VVePMPVPGHDVEAY CLLCE 77 
GTG GTG GAG CCC ATG CCA GTG CCT GGC CAC GAT GTG GAA GCC TAC TGC CTG CTC TGC GAG 321 

CRYEERSTTTIKVXIVlYtS 97 
TCT AGC TAC GAG GAC CCT AGC ACC ACA ACC ATC AAG GTC ATT ATT CTC ATC TAC CTG TCT 381 

VVGALLLYMAFLMLVDPLIR 117 
GTG GTG GGG GCC CTC TTA CTC TAC ATG GCC TTC CTG ATG CTC GTG GAC CCG CTC ATC COG 441 

KPDAYTEQLHNEEBNEOART137 
AAG CCA GAT GCC TAT ACT GAG CAG CTG CAC AAT GAA GAG GAG AAT GAG GAT GCT CGC ACC SOI 

HATAAAS IGGPRAKTVLERV 157 
ATG GCA ACA GCC GCT GCG TCC ATT CGA GGA CCC CGG GCA AAC ACT GTC CTC GAG OGG GTG Sfil 

BGA0QRWKLQVQB0RKTVPO177 
GAA GGC GCT CAG CAG CGG TGG AAC CTG CAG GTG CAG GAG CAG CGG AAG ACG GTC TTC GAC 621 



R H K M L S * 
CCA CAC AAG ATG CTC ACT TAG 



184 
642 



CCTTCAAATGCCCATGC CGrrr ATarrrCir C CTCTCTACAAATGTACrCGA C TG lTA T 800 

TCTCTCTAOCTCrCTGCGGGGTAGAOGGCAOGGGACGGAAGGCAGAAGGGAACAGAGACATTTGACGTGGC 879 

TC CC T GC AATTCATC CC T CC T G T C TTCACCATTCCTCCCACCTCCACATCrrAACCATGOT^ 956 

CTCATCAACAGCTCAGTGOCTCCCAOCAAACTATGATCCACCCCTCACCCTTC C CTCTAOGATGCTC 1037 
CCa O I ILL I lU UnXXrACTACTTTAACTTCGCCTACCCCAGTCTCAOCAAClbl lUiUiiU:<J CClt;A GCCCACAGTC 1116 

ATCTCCACACTCCACCT CG AAC CLXX; i n CCCCT C ntX:r C C C CTCCTGGTCC A CCACTGCATGCCAC^ 1195 

CCGCATATTCACCACCTCTCACCTTACTCCCATCCCAGCAJCCCGTAACCCCrCCCACCTCTC C CCT 1274 

CCrGACCCATAAAGTTCGACCATATCACACAAGCCCAATCGGGACCGGAGTACCATCCCrCCrCTCCT^ 1353 

rrCTCCCTCAAriTCATTCTATCATCCATGCACACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 1432 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGGGGGC 1510 
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GAATTCCGCACGAGOGGAl 

M A T L W G 
CCGGGAGCCGCrrCGCG<XXjGCTCCGGGCTtriXXX3^ ATG GCG ACC CTG TGG GGA 



6 
149 



CLLRLGSLLSLSCLALSVLL 2$ 
GGC CTT CTT COG CTT GGC TCC TTG CTC ACC CTG TCG TGC CTG GCG CTT TCC GTG CTG CTC 209 

LAQLSOAAKMFEDVRCKCIC 46 
CTG GCG CAC CTO TCA GAG CCC GCC AAO AAT TTC GAG GAT GTC AGA TCT AAA TGT ATC TGC 269 

PPVKBWSGHIYMKNXSQKDC 6$ 
CCT CCC TAT AAA GAA AAT TCT GGG CAT ATT TAT AAT AAC AAC ATA TCT CAG AAA GAT TCT 329 

DCLHVVEPMPVRGPDVEAYC 96 
GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTG CGO GGG CCT GAT GTA GAA OCA TAC TGT 389 

LRCECKYEERSSVTIKVTII106 
CTA CCC TCT GAA TGC AAA TAT GAA GAA AGA ACC TCT GTC ACA ATC AAC GTT ACC ATT ATA 449 

^YLSILGLLCLYMVYLTLVEWfi 
ATT TAT CTC TCC ATT TTG GGC CTT CTA CTT CTO TAC ATC GTA TAT CTT ACT CTG GTT GAG S09 

PIt*KRRLFCHA0LI0SD0Dri46 
CCC ATA CTC AAG AGO CGC CTC TTT GGA CAT CCA CAG TTC ATA CAG ACT GAT GAT CAT ATT 569 

C0HQPPANAH0VLARSRSRA166 
COO GAT CAC CAC CCT TTT GCA AAT GCA CAC GAT GTG CTA GCC CGC TCC CCC ACT CGA GCC 629 

NVLNKVBYA0QRWKLQV0E0 186 
AAC CTC CTG AAC AAG CTA GAA TAT GCA CAC CAO CCC TGG AAG CTT CAA GTC CAA GAG CAG 689 



RKSVFDRHVVLS • 
CGA AAC TCT GTC TTT CAC CGG CAT CTT GTC CTC ACC TAA 



199 
726 



TTCCGAArroAATTCAAGGTGACTACAAACAAACACCCAGACAACTGGAAAGAACTGAC T G CC l' IT CA T 807 

TrTAATACCITCTTCATTTCACCAACTG'nXiCTUGAACATTCAAAACTOGAACa i 1 [ I ' lirrC T 886 

TCTTAAOH'AATAATACAGACATTTTTAAAACCACACACCTCAAACTCACCCAATAA GICi 1 1 iCLTA TTTCTCACTTT 965 
TACTAATAAAAATAAATCroCClXJTAAATTATCTTGAA G r CCl i lA CCTCGAACAACCACT ClC ' n " n"lx:A CCACATAG 1044 
rrTTAACTtGACTTTCAACATAATTTTCAG GCmTTG TT Ol Ibl U i i 101 1 ' ilii I Ibn ' l iUG ' mGG AGAOGCC 1123 
ACOCATCarrCGGAAGTCCTTAACAACnTTTTCAACTCACTTTACTAAACAM X202 
ATTTTCGACrrTCATTTATATTrrCCACTCTACCCACCCrCATCAAAGACCrCACr^ ^281 

ctctcttatctcgctatctgcrctgtcrrccacttcatcctaaaccgcatctaaaatccc 1360 
caoattttcttcatctactctgatctctcatco%atccatcctagaaca;^ 1439 

CTAAACATACTC li 10 10 lUi ;'C'ilACrCATCTTCTACTACCTTTAACCACAAATCCTAACCACTTCCACACTTG 1 518 
CAATAAAGAAATTTTATTTTAAAAAAAAAAAAAAAAAAAAACTCCCCCCCC XS69 



/ ill •) 
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GTCGACCCACXIOGTCCGGG< 




N A S L W 

ATG GCG AGC CTA TGG 



5 
73 



CGNLLRLCSGLSMSCtiAL5V 25 
TGC GGA AAC CTG CTG CGG CTG GGC TCG GGG CTC ACC ATG TCC TGC CTG GCG CTG TCG GTG 133 

LLLAQLTGAAKNFEDVRCKC 45 
CTG CTG CTC GCG CAG CTG ACA GGC GCC GCC AAG AAT TTT GAA GAT GTG AGA TGT AAA TGC 193 

ICPPVKSNPGHXYNKNI5QK BS 
ATC TGC CCT GCC TAT AAA GAG AAT tCT GGG CAC ATT TAT AAT AAG AAT ATA TCT CAG AAA 253 

OCOCLHVVEPHPVRGPOVEA 85 
GAT TGT GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTA CGG GGA CCT GAT GTA GAA GGA 313 

YCLRCBCKYEERSSVTIKVT105 
TAC TGT CTA CGC TGT GAA TGC AAA TAG GAA GAG AGA AGC TCT GTC ACA ATC AAG GTT ACC 373 

r IIYLSILGLLLLYMVYLTL12S 
ATT ATA ATT TAT CTC TCT ATT TTG GGC CTT CTG CTT CTG TAC ATG GTA TAT CTT ACC TTA 433 

VEPII>KRRLFGHSQLLOSD0 145 
GTT GAG CCC ATC CTG AAG AGG CGC CTC TTT GGA CAC TCC CAG CTG TTG CAC AGC GAT GAT 493 

DVGDHQPPAIIAH0VLARSRS165 
GAC GTT GGG CAT CAC CAG CCT TTT GCA AAT GCC CAT GAT GTG CTG GCC CGC TCT CGC AGC 553 

RAMVLNKVEYA00RWKI.QVQ185 
CCA GCC AAT GTT CTA AAC AAG GTG GAC TAC GCT CAG CAG CGC TOG AAG CTC CAG GTC CAC «13 

E QRKSVFDRHVVLS* 200 
GAG CAC CCA AAG TCT GTC TTC CAC CGA CAC GTT CTC CTC ACC TAA 6SB 

CTCGGAACTGGAATCAOGTGACTACCAAGAACACGCAGACAACTGGGAAGAATTCTCTCGCTCT 737 

CC ATOL; LUI i il lA CAA A f C Crri X ri'U^ATGGAGGAAGACTCCAAACTGGAAGCAAACCCCATG 816 

CTTAATATATTAATAGAGACATTTTTACAGCACACAGTTCCAACTCAACCAGTAAGTCrm 895 

CrAATAAAATrAACCT CCC r G T CA GTT ATCf T GA AOCC CCG l TjCC TG G AACAAG C rC^ 974 

TAA CTTCG T GlT C A AGATAACTTCCAG GlXJlX; ' irriIU.llUr^ ^ ^ ^ 1053 

GGCA UiOCi^G ACTAG ClTCl ' CA A CrG TCTT T T CCA CACAGACrTATOAATACr^^ 1132 

AATGTCCCAGTCTAGCT CGCl l Ol 'CACCCTCCTCGCCTCCCCACTTGA Cl 1 1 li; CACTCACTACATTACCTAAGATTCT 1211 

CCTTACCCTGTG CC TGCATTTCATCACCACTTCCATCTCAAATGCCrGCC CCC TCCTCA^^ 1290 

TCCACrCTCATCTCTGACCCAACATGrrCTAGAACAOACTCGCCATCrccrAGTrr^^ 1369 

CAGTCTCTGTCCTCTTCCTCATLi'lCl rCTACTAGCrCTAACCACTrCAACATTTACAATAAACACATTTTCTCTTAAG 1448 



CCCAAGCCTCCCTGGATCArrCACCTACAAATACTCATCACCCTTTT C T C T Cri UC ' IXI AGACGCA O I I L ' i l IliA ACTGA 1527 




TCTCCCCAJCTrrGAACAAGCACTACACTTCAGATTCCCTCTCT C T C ACAACTCTAACAGTTArr^ I i 1606 
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1 1 iLlTCClA C A T CCTCTO t X MVATCnJUCftATAAAATAATTTACA^^ 1$81 



10 ( l.rL 
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GTCGACXrCACGCGTCCCCTCTCyVGTCACCXXyVATCT^ 79 

GCGGCCTCTTCG CnYltyl t JG CCXyrCCCCGCGCr^^ IS 8 

MIRCGLACE 9 
CCCCTCCGCrCCGCTCC3GCTCGGCCCCGCCCXGCCCCTCAAC ATG ATC CGC TGC GGC CTG GCC TGC GAG 227 

RCRHILPLLLLSAIAFDZIA 29 
CGC TGC CGC TGC ATC CTO CCC CTO CTC CTA CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 287 

LAGRGHLQSSDHGQTSSLWW 49 
CTG GCC GGC CGC GGC TGG TTG CAG TCT AGC GAC CAC GGC CAG ACG TCC TCG CTO TGG TGG 547 

KCSQEGGGSGSYEEG CQSLM 69 
AAA TGC TCC CAA GAG GGC GGC GGC AGC GGG TCC TAC GAG GAG GGC TGT CAG AGC CTC ATG 407 

EYAWGRAAAAMLPCGFIILV 89 
GAG TAC GCG TGG GGT AGA GCA GCG GCT GCC ATG CTC TTC TGT GGC TTC ATC ATC CTG GTG 467 

ICFILSFFALCGPQNLVPI,R109 
ATC TGT TTC ATC CTC TCC TTC TTC GCC CTC TGT GGA CCC CAG ATG CTT CTC TTC CTG AGA 527 

VIGGLLALAAVF0XISLVZYI29 
GTG ATT GGA GGT CTC CTT GCC TTG GCT GCT GTG TTC CAG ATC ATC TCC CTG GTA ATT TAC 587 

pVKVTOTFTLHANPAVTYIY 149 
CCC GTG AAC TAC ACC CAG ACC TTC ACC CTT CAT GCC AAC CCT GCT GTC ACT TAC ATC TAT 647 

MWAYGFGWAATIILrGCAFFX69 
AAC TGC GCC TAC GGC TTT GGC TGG CCA CCC ACG ATT ATC CTG ATT GCC TGT GCC TTC TTC 707 

PCCLPWYEDDLLOKAKPRYP189 
TTC TGC TGC CTC CCC AAC TAC GAA GAT GAC CTT CTC GGC AAT GCC AAC CCC ACG TAC TTC 767 

Y T S A • X94 
TAC ACA TCT GCC TAA 782 

CTTGCCAATGAATGTGGGAGAAAAlXX ; cr G CT C CTGACATGGACTCCAGAAGAAGAAAC 86 1 

AACCCArrTTTTCGCACTCTTCATATTATTAAACrACrrCAAAAATCCrAAAATAAri^^ 94 0 

AGTCTTATA GrrrCA TCTTTATCrmATT AiG ' iri-lUXtjJ^ X019 

ATTTCCTTATATCrATCCATAACATTTATACTACATTTCTAAGACAATATCCACCTCAAACTTAACAO 1098 

AAAATGA CGTT T CCA ACATTrAATAATCTCATCAA G TT C TT C T rA TTTCaiAATACAATGCAC^ 1177 

TAAOGAGAAGAOGAAGATAAGCrrAAAAGTIXTrrAATCACCAAACATTCrAAAAGAAATGCAAAAAAAA;^^ 1256 

CAACCCrrCCAACrATTTAAOCAAAGCAAAATCArrTCCrAAATCCATATCATTTCT^ 1J35 

CAATCATTCATTTTACCTAACCCrTCATCTTGACTayVTATCTCATCTACCAAAGTACTATr^ 1414 

TCCCATAGTTUCT A ACC Ll I ItCM iA AGTCTCAAATATTTACATGAA Ai 11 ILIU 1 1 lA AA Gl ICA 1 lA TACGCTTA 1493 

GCGTCTQCCAAAATCCTATATTAATAAATCTCTA O I U 1 i V T O iU 1 1 1 AT AiU I ' I ' L ACAACCAGAGTACACTGGATTGAA 1572 



ACATgCACrCCCTCTAATTTA7CAT(^CTCATACAT C IX Ai rT A A C TT C 7 C T A CTAAAGClTTACCACCC^ ^ 1651 
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CACAAAAGTGCCACTAAAACAGCCTCAGC»U»UTAAATGAC^^ 1730 

TATitfSftCMG CnTC T GA TA G r r T GC AACTOTAA^^ LlT'i m X AA T A AACA. 1809 

OATTTTAAATCnrCnrGATATAAAAaiTGCCAC^ 1888 

ATCGGATAGGTOlTTATGArrTTTTACCATTTCSACTTACA^ 1967 

TTTT GT AA G TT C T GG AAAAAGCTAATTGTA bi 1 1 4>C aTTATGAA bi 1 1 ILL GAATAAAOaiCSGGCATrCTAAAAAAAAA 3046 

AAAAAAAAAAAGGGCGGCCGC 
2067 
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GTOOACCCACGGGTCCGGOacrCTGAGTCACOCXaAATCAAGGTtm^ 




79 




158 




MLRCGLACE 
JCCCXKXGCCACXMACGAC ATC CTG CX3C TGC GGC CTG GCC TGC CSAG 



9 

226 



RCRWILPLLLLSAIAFDIIA 29 
CGC TGC AGG TGG ATC CTG CCC CTG CTG CTG CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 286 

LAGRGWLQSSNHIQTSSLtfW '49 
CTG CCC GGC CGC GGC TGG CTG GAG TCT AGC AAC OVC ATC CAG ACA TCG TCG CTT TGG TGG 246 

RCPOEGGGSGSYOOGC0SI.N 69 
AGG TGT TTC GAC GAG GGC GGC GGC AGC GGC TCC TAC GAC GAT GGC TGC CAG AGC CTC ATG 40$ 

SYAftGRAAAATLFCGFllLC 89 
GAG TAC GCA TGG GGA CGA GCA GCT GCA GCC ACG CTT TTC TGT GGC TTT ATC ATC CTG TGC 4€6 

rCFlLSFFALCGPQMLVFLR109 
ATC TGC TTC ATT CTC TCG TTC TTC GCC CTG TGT GGA CCC CAG ATG CTT GTT TTC CTG AGA 52$ 

VraGLLALAAIF0riSLVIVl29 
GTC ATT GGA GGC CTC CTC CCA CTG GCT CCC ATA TTC CAG ATC ATC TCC CTG GTA ATC TAC 586 

PVKyT0TFRLHDNPAVKYiyi49 
CCC GIG AAG TAC ACA CAG ACC TTC AGG CTT CAC GAT AAC CCT GCT GTT AAT TAC ATC TAT 646 

KWAYGFGWAATIZLIGCSFF169 
AAC TGG GCC TAT GGC TTC CCA TOG GCG GCC ACC ATC ATC TTG ATT GGT TOT TCC TTC TTC 706 

FCCLPNYEDOLIiGAAKPRYF189 
TTC TCC TGC CTC CCC AAC TAC GAG GAT GAC CTT TTG GGG CCC GCC AAG CCC AGG TAC TTC 766 

Y P P A * 194 
TAT CCC CCA GCC TAA 781 

TCTCGGACCAAGAGCClXjAGAAAAicX: c rC C TGCAACATGOATCTC^^ 860 

ACCTTTGGGCAATCTTCATATCATCACAAATCCrAGAATAAATGCTAAAGAAAATTCTTCATAATTAG^ 939 

ATCTAT G T C CT C TC C ACTTAAAAAGACTTGAA inHr rC lTnX rrAAGTATATCCTA AI 1 1 i ILLl lA TCTCAATTCTATA 1018 

CCATrTAACCTTCATrTCTTAAAGAATATGCCrCTGAAACTTCATAAGCTACAAATCT X097 

CTCATGCGC dTCllil 1 11 ItCA CATACAATCC O i iUl A ILiG CTAAGCCCTACAGACCAGGAAACTCACTGCCAAAAC 1176 

rrCCCTCACCAAATATCCrCAAATTACTATTTTTTTAAAAAGACCTTATTTrCACTTTTC^ 1255 

ACCAaA nXX;r TT C CTAAGTGACCATC ClT r C T C ACA Ai ri ' l T A CTCA OlUl 1 1 ILA ACAATT Al lU i 1 IITCTA ACCT 1334 

laiiViiG A C TT'l' CrC T C ATCCCTACAAAACTCTrCTAACGTAG 1413 

GA Al 1 1 ICCiH 1 1 iLLLbiA GTGTACAGGGGTA CCC T G T GOG AAGAAG CCGlXilTA CCACATCT C T A CTA 1492 

CTATGCTTACAACCACCCTACACCOCATCCGACCATOCACrACGCCTAATCCCTCCCAACTCGTGCATCT^^ 1571 
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PJOGTAGGMCGCACACakS^^ 1650 

TTCTCA b ^ itiCri ' CrX ' C CCtTAACTGACXn' ^ 1729 

TAAITAAAACCTGGTCTTCCTTGGTAAGCAGACTTAAAATATC^ 1808 

TGTCrCTGAATAOlTACCGGAACGGCTACrATTACCrTTTCC^ 1887 
TTAACTATCAGAACACTATTTTGTAAGGTGCTXjCAAAGACAC^^ 

TtnrCAAAAAAAAAAAAAAAAOUWUUUAAAAAAAAAAAAAAAAAAAAAAAAAAGGCXGGC^ 2030 
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GTOGACXrACGC^i c ai U L'L UC U^gCTCTCTCC^^ 79 

MAGIPGLLFLLF 12 
GGGCTGCTCGGCGCGGAAawrrGCTCGGC ATG GCA OGG ATT CCA GGG CTC CTC TTC CTT CTC TTC 144 

FLLCAVGOVSPYSAPWKPTW 32 
TTT CTG CTC TGT GCT GTT GGG CAA GTG AGC CCT TAC ACT GCC CCC TGG AAA CCC ACT TGG 204 

PAYRLPVVLPQSTLMLAKPD S2 
CCT GCA TAC CGC CTC CCT GTC GTC TTC CCC CAG TCT ACC CTC AAT TTA GCC AAO CCA GAC 2^4 

FGAEAKt.EVSSSCGPQCHKG 72 
TTT GGA GCC GAA GCC AAA TTA GAA GTA TCT TCT TCA TGT GGA CCC CAG TGT CAT AAO GGA 324 

TPLPTYEBAKOVLSYBTLTA 92 
ACT CCA CTG CCC ACT TAC GAA GAC GCC AAG CAA TAT CTG TCT TAT GAA ACG CTC TAT GCC 384 

NGSRTETQVGIYILSSSGOG 112 
AAT GGC AGC CGC ACA GAG ACG CAG GTG GGC ATC TAC ATC CTC AGC ACT ACT GGA GAT GGG 444 

AQKROSGSSGKSRRKROZYG 132 
GCC CAA CAC OGA GAC TCA GGG TCT TCA GGA AAO TCT CGA ACG AAG CGG CAG ATT TAT GGC 504 

YDSRFSIPGKOFLLNYPPST 152 
TAT GAC AGC AGG TTC AGC ATT TTT OGG AAG GAC TTC CTG CTC AAC TAC CCT TTC TCA ACA 564 

SVKLSTGCTGTLVAEKHVLT 172 
TCA GTG AAG TTA TCC ACG GGC TGC ACC CCC AGC CTG CTG GCA GAG AAG CAT GTC CTC ACA G24 

A.AHCIHDGKTYVKGTQKLRV 192 
CCT GCC CAC TGC ATA CAC GAT GGA AAA ACC TAT CTG AAA GGA ACC CAG AAG CTT GQA GTG 684 

GFLKPKPKDGGRGAKOSTSA 212 
GCC TTC CTA AAG CCC AAG TTT AAA GAT OCT OCT CGA GGG GCC AAC GAC TCC ACT TCA CCC 744 

MPBQMKFOWIRVKRTHVPKG 232 
ATC CCC GAG CAG ATC AAA TTT CAG TOG ATC CGG GTC AAA CCC ACC CAT GTG CCC AAG GGT 804 

WIKGNAKOIGMDYDYALLBL 252 
TOG ATC AAG GGC AAT CCC AAT CAC ATC GGC ATG CAT TAT GAT TAT CCC CTC CTC GAA CTC 864 

KRPHKRKPHKZGVSPPAKQL 272 
AAA AAC CCC CAC AAC ACA AAA TTT ATC AAC ATT GCC GTG ACC CCT CCT CCT AAG CAG CTG 924 

PGGRIHPSGYOKDRPGNLVY 292 
CCA CCC CCC ACA ATT CAC TTC TCT CCT TAT CAC AAT CAC CGA CCA CGC AAT TTC CTG TAT 984 

RFCDVKOBTYOLLYQOCDAO 312 
CCC TTC TGT GAC CTC AAA CAC CAC ACC TAT GAC TTC CTC TAC CAC CAA TCC CAT CCC CAC 1044 

PGASGSCVYVRMHKRQOOKW 332 
CCA GCC GCC ACC CGC TCT CCC GTC TAT CTC ACG ATG TGC AAC ACA CAC CAG CAG AAC TCC 1104 

BRKirCIFSGKOWVOMNCSP 352 
CAC CGA AAA ATT ATT GGC ATT TTT TCA CGC CAC CAG TGC CTC GAC ATG AAT CCT TCC CCA 1164 
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QOFKVAVRITPLKYAOICYH 373 
CAG GAT TTC AAC GTG CCT GTC ACA ATC ACT CCT CTC AAA TAT GCC CAO ATT TGC TAT TGG 

IKGNYLOCRE G* 3S4 
ATT AAA CCA AAC TAG CTG GAT TCT ACG GAG GCG TGA 1260 

CACAGTGTTCCCrCCriXX;CAGCAATTAACXOTCTTCA'ra 1339 

CGTGCACA O g XG T G TGTGTCT GTG TGT G T G TAAG GrU TCT TA TAATCTTT^ 1418 

GGCTTTACTATTTGAAAA C IT U Gmi^'ltjT A TCATATCaTATATCATTT;^ 1497 

ATAAAAAAAATACTGATTTG GG GCAATGAGGAATATTTGACAATTAAGTTAATCTTC^ ^ 157£ 

TTATTTCATCnXBUUnrCTTTCAAAGATTTATATrAAATATTTOGCAT^ 1655 

GTGlVXVl T C TT C T G AGATTm iX:TlM3G T a;at;CG ^ 1734 

TAAGGCAGTGTTCCCATTTAGGAACTTTGACAGCATTTCTTACGCAGAATATTTTGGAT 1813 

G ' iV m xa ACAGTAAAATG Al l Ultiri i G ACTATACTCATAa^C^^ 1892 

CCTCCTTTTACTTCCAAAAATACTTTCTTTTCCAAACCTTGTTGCTCTACrr^^ 1971 

CCAA LlTrA AAGTCATACCAQAOTOOCCAAGAUlXilTlATCCCAACt^ 2050 

GGAACTACCTA mTrC AGAAGAOUlTAATCAGGGCTTAATTAGAACAOGCTCT ^ I rCCrLUa tf;CAAACA ST TCT GG 2129 

CCACACTAAAAACAATCATAGCATTTTACCCCTO G ATTATAGCACATCTa iiU 1 1 ' A TC A TT TGG ATOQAGTAATTTA 2208 

AAATGAATTAAATTCCAGACAACAATCGAAGCATTCCCTGGCAGATGTCACAACAG^ 2287 

GCACAGTCCTCCAGCCTGATCAAAAATTATTCTGCATAOTTTTC^ 2366 

TCGAAACTTTT C T C T C TCATTTATACTGAAAATA Cri^G AAGTTACTTTAACAAAAC^^ 2445 

GCTTTAAAAGCGCCGCTTTTCCTCGAATGCTCrAGGTTATAGATAAACAATTAGCTAT^^ 2524 

AACAATOaUAATGGATCAGAATCATCC C TT CCA ATAAAGG CCXnT ACAC Al^ ^ ^ ^^ ^ 2603 

ACCATATACaGAAAACACrTCGACrTATtCT ATOI Tm » TTTTAT^^ 2682 

TCCGAGAAAAAATCAAATGGACTACAAGCACCTGTTTGCntrrCCT^ 2761 

TAAGCATATTCACATCGACCACTCTCACTTAGACArTCrClXiGGGC ^ 1 1 lLiU,i AU T Cn rCTlX;A C Cl 1 1 1 ' lUi AA 2840 

GGATAATT C TCATAAGGCACTCAACAAACGTACAACCACA Gl X a: TT TCT T CA A^ 2919 

AACCACATGCACAGCCCCXAGGAAAATTCTGACTTCCAGCAC^ 2998 

AACAAGCGAGCTCTCCATTrCT ATO T C T CC TATTrGGG ^ I IGLi r i A C C Tl GC T GA AAAAAACTTC 3077 

ACTCAACACCAACACCAGAATCGA' r iT ' r I ' T TAAAAAAATAGAT Ol I L Cl U I blU AAGCACCTTCATTCCTTGA ii 1 IG 3 1 S 6 

ATTTTTTCCAAACTTAGACAATGCCACAAACTO%AAATCAAATCAATCTT^ J 23 5 

GAATGATACACCCATATGCTATATACACCTTAACTCACACAACTGTAAAAGAAAAl^^ 3314 

TCTTTTTAGTGATAATAAAACAAAGCATCCTArrAAACTATCATAGAAGTACACACAAAAACAA 3393 
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ATTATTAATATAATTAUl'UCTrrACAlXaWtTAEnTATACA^ 3472 

ACAiyiXXX!AAAGTCTGCTCCTTAAACACrCATO(X'frATGAl^^^ 3 5 5 X 

AGGAAGATGCCTCTCCAITrrCC ClCrCTr f A ^ 3630 

TGTTGTAAAGGGACAAGTTGAGGTTCTAAAATCTGCATTTAAAT^^ 3709 

GGCCG 3714 
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GTCGACCaVCGCCTCCCaXSACGaOTGGCACTCOGCCACTCTGCGCy^^ 



CGCOCACACCTCTCICMGCGGOCKACGGCCXiCGGCCI 




N A 

'AGCACTCACC ATG GCT 



2 

153 



GIPGLFILLVLLCVFMQVSP 22 
GGA ATC CCG GGG CTC TTC ATC CTT CTT GTC CTG CTC TGT GTG TTC ATG CAG GTG ACT CCC 213 

YTVPWKPTMPAYRLPVVLPQ 42 
TAC ACC GTT CCG TCG AAA CCC ACA TGG CCG GCT TAT CGC CTC CCT GTA GTC TTG CCT CAG 273 

STLNLAKADFDAKAKLEVSS 62 
TCT ACC CTC AAC TTA GCT AAG GCA GAC TTC GAC GCC AAA GCG AAA TTG GAG GTG TCC TCC 333 

SCGPQCHKGTPLPTYEEA kQ 83 
TCA TGT GGA CCT CAG TGT CAC AAG GGA ACA CCA CTG CCC ACC TAC GAA GAG GCC AAG CAG 393 

YLSYBTI.YAMGSRTETRVGI X02 
TAC CTT TCC TAT GAA ACC CTT TAT GCC AAT GGC AGC CGC ACA GAG ACT CGG GTG OGC ATC 453 

YILSNGEGRARGROSBATGft 122 
TAC ATC CTC AGC AAT CGT GAA GGC AGG GCA CGA GGC AGA GAC TCG GAG GCC ACA GGG AGA 5X3 

SRRKRQXYGYDGRFSIFGKD 142 
TCT OCC AGG AAG AGO CAG ATT TAT GGC TAC GAT GGC AGG TTT AGC ATT TTT GGG AAG GAC 573 

FLt*IIYPFSTSVKi:.STGCTGT 162 
TTC CTG CTC AAT TAT CCT TTC TCA ACA TCG GTG AAG TTG TCT ACT CCC TCC ACT GGC ACC 633 

LVAEKHV, LTAAHCIHDGKTY X82 
CTG GTG GCA GAG AAG CAC GTC CTC ACT GCT CCC CAC TGC ATA CAC GAT GCG AAA ACC TAT 693 

VKGTQKLRVGFLKPKYKOGA 202 
GTG AAA GGG ACA CAG AAA CTC CGA GTG GGC TTC CTG AAG CCC AAG TAT AAA GAT GGT GCC 753 

EGDNSSSSAMPDKMKFQWIR 222 
GAA GOG GAC AAC AGC TCG AGC TCA GCC ATG CCA GAC AAG ATG AAG TTT CAG TGO ATC CGC 8X3 

VKRTHVPKCMrKONAMDIGM 242 
CTC AAA CGC ACC CAT CTC CCC AAC CGC TCC ATC AAG GCC AAT GCC AAT GAC ATC GCC ATG 873 

DYOYALLELKKPHRROFMKZ 262 
GAT TAT GAC TAC GCC CTG CTC GAA CTC AAG AAA CCC CAC AAA AGA CAG TTC ATG AAG ATT 933 

GVSPPAKQLPGGRtHFSGYD 282 
OCT GTG ACT CCT CCA CCG AAG CAG CTC CCA GGG GCC AGG ATC CAC TTC TCT CGT TAT GAC 993 

NORPCWLVYRFCDVKDETYD 302 
AAT CAC CGC CCC CCC AAT TTC CTC TAC CGC TTC TCT GAT CTC AAA GAT GAG ACC TAC CAC 1053 

t-LYOOCDAQPGASGSGVYVR 322 
CTT CTC TAC CAC CAG TGT GAC GCC CAC CCC GCG GCC ACT OCT TCA GOG GTC TAT CTG AGC X113 

MWKRPQOKWBRKIICIFSCH 342 
ATC TOG AAG AGA CCA CAG CAC AAA TOG GAA AGA AAA ATT ATC OCC ATC TTT TCA GGG CAC 1173 

OWVDMNCSPQD FNVAVRITP 362 
CAG TCC CTC GAC ATG AAT CCC TCT CCA CAC CAT TTC AAC GTC CCA CTT AGA ATC ACG CCT 1233 



wo 00/18904 



24/112 



PCTAJS99/22817 



LKYAOICYWIKGNYLDCRBG 382 
CTT AAA TAT GCC CAG ATT TGC TAT TCG ATT AAA GGA AAC TAC CTA GAT TGC AGG GAG GGG 1293 

• 383 

TGA 129$ 

CATG CUTCri ' C ' i T G CCaGCACCaATGGT CVm ^' rtf CACT 1375 

GXGTGAGTCACATAfiTATCTTTTACCTAGTATTCTTCAAATGGCAAAAATTATTCGCrAT^ 14 54 

GTGCGTTATAGCATTTAAGCAGTCTOAAAGCATACTTTTGCATAGAGACTTTAAACT 1533 

GACAAGGAAGrrAAACTTTCA GTrmtJG AgAATTCT ^ 1612 

ATAayrGACAGACAGGOAATATGAATTCTTATGTTTCTATATGTATATGTTTTCT^ 1691 

TTGTAATGTCri XR JlT A TTATGCTTCCAGATAATGATAGCAAAGTCTT^ 1770 

CATTTACGTAGTAGTCCrTGAAGAGAACAATAATTTATT^ 1849 

ACAGAATTCCCACG C iOCT 11 rAU ' i'Tl ' 1 ' LI AAAATAAAACTTTC L L 1 TO r A AAAAAAA A AAAAAAAA A AAAGGGCGGCOG 1928 

ACAGAATTCCCACC Cr C Cmn 'AGTrTTGAAAATAAAA C Y'I TCLV^ ^^ 1928 
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MAPASRLLALHALA 14 
GTCGACCCACGCCTCCGCGCTC ATC GCC CCC GCG TCG CGG TTC CTC GCG CTC TGG GCG CTG GCG 64 

AVALPGSGAEGOGGWRPGGP 34 
GCT GTG GCT CTA CCC GCC TCC GCG GCC GAG GGC GAC GGC GOG TGG CCC CCG GGC GGG CCG 124 

GAVAEEERCTVERRADLTYA 54 
GGG GCC GTG GCG GAG GAC GAG CGC TGC ACG GTG GAG CGT CGG GCC GAC CTC ACC TAC GCG 184 

BFVQQYAFVRPVILQGLTDN 74 
CAO TTC QTO CAG CAC TAC GCC TTC GTC AGG CCC GTC ATC CTG GAG GGA CTC ACQ GAC AAC 244 

SRFRALCSRDRLLASPGDRV 94 
TCC AGG TTC CGG GCC CTG TGC TCC CGC GAC AGG TTG CTG GCT TCG TTT GGG GAC AGA GTC 304 

VRLSTAIITYSYHKVDI.PFQE114 
GTC CGG CTG AGC ACC GCC AAC ACC TAC TCC TAC CAC AAA GTG GAC TTG CCC TTC CAG GAG 264 

YV6QLLHP0DPTSLGNDTLY 134 
TAT GTG GAG CAG CTG CTG CAC CCC CAC GAC CCC ACC TCC CTC GGC AAT GAC ACC CTC TAC 424 

FFGDNIIFTBWASLFRHYSPP1S4 
TTC TTC GGG GAC AAC AAC TTC ACC GAG TGG GCC TCT CTC TTT CGG CAC TAC TCC CCA CCC 484 

PFCLLGTAPAYSFGZACACSI74 
CCA TTT GCC CTG CTC GGA ACC GCT CCA GCT TAC ACC TTT OGA ATC CCA GGA GCT GGC TCG 544 

GV P FKWHC PGYS EV I Y GRKR 194 
COG GTC CCC TTC CAC TOG CAT GGA CCC GGC TAC TCA GAA GTG ATC TAC CGT CGT AAC CGC 604 

W FLYPPEKTP BFHPNKTTLA214 
TGG TTC CTT TAC CCA CCT GAG AAC ACC CCA GAG TTC CAC CCC AAC AAG ACC ACG CTG GCC 664 

WLROTYPALPPSARPLECTI234 
TOG CTC CGC CAC ACA TAC CCA CCC CTG CCA CCC TCT CCA CCC CCC CTC GAC TCT ACC ATC 724 

RAGeVLYFPDRWWHATLNLD254 
CCG GCT GCT CAG CTG CTC TAC TTC CCC GAC CCC TCG TCC CAT CCT ACC CTC AAC CTT CAC 784 

TSVFISTFLC* 265 
ACC AGC CTC TTC ATC TCC ACC TTC CTC CCC TAG B17 

CCAAAACAGCTGCCACCACTCCaXrrCACACACCAGCACCTCCCACCTCGTGCTCACCGA^^ 896 

GCCCCAATGGCCTCAGCCCACCCCACCCTCACCTCCTTTTCCAGCCCACAAAOCCCGACCATC^ 975 

GATCCTCACAGCOGAAACACTCCAGACTCCAACACCAGAACrTCCCGCAACCGCTCCCOCTCCCCACCA^ 1054 

TCTATAGOCGCCCCGCGCTTCTCCCCAOGCCTCCCCTOGACCACGACGCCACCTACGCCACCCAACCTC^ 1133 

CACCCAGCCATTCTCAGACATCAATCCCTCAATAACCTCCTTCATACCCAACTTCOGGATC^ 1212 

CCCC7CC0GGTO\CCCGCTCAAAATCACCCACACCCTCCACTCACAACAA0CGCAGACCCCAGTCATW 1291 

CATGCCACTgCCCCTCCTCCCCCACCCCCAGCCCTCACCTCCAGCTCCTCCTCG A T C TC C r rC ^ 1370 

CACraJGCCTCATGCACCCCTCCCCCATCAGCTCAAACCTCATCTTCCCACACACCTACT 1449 



wo 00^8904 



26/112 



PCTAJSW/22817 



1528 

FlTCCACCTt»CAAAAAAGCTCOTCCATC7rCCK 1607 

llCACCCCACCTCCCTCCrCCATGGGGCACA^ 1686 

CGTGCA»a^a«5aCCCACATGT G<K rrG G GGGG^ 1765 

CGTtaSCT G T C GTCCTCATCACCCTCGTGG- lT r aJ C^^ 1844 

AOATGGACCTCGCCAGATGTCnXSACCACACCCCAATCTCA^ 1923 
GTAAAGCCTTCGATAAACAAAAAAA A AAAAAAAAAAAAGGGCGGCCG 1970 
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MAAAGRRGLLLLPV 14 
GTOQACCCACGCGTCCCGTTC ATG GCG GOG GCT CGG OGO CCC GGT CTC CTT TTO CTC TTT GTA 63 

LWNMVTVILPASGSGGHXQN 34 
CTA TGG ATG ATG GTO ACT GTG ATT CTG CCT GCC TCTGGCGAAGGGGGATGGAAACAG AAT 123 

GLGIAAAVMEEERCTVERRA 54 
GGG CTG GGA ATT GCA GCA GCA GTA ATG GAG GAG GAG CGT TGC ACA GTO GAG CGT CGG GCA 183 

HITVSEFMQHYAFLKPVILQ 74 
CAC ATC ACG TAG TCC GAA TTC ATG CAG CAC TAT GCC TTC CTC AAG CCC GTC ATC TTG CAA 2*43 

GLTONSKPRALCSRENLLAS 94 
GGA CTC ACG GAC AAC TCG AAG TTC CGG GCC CTG TGT TCC CGG GAA AAC CTG CTA GCC TCG 303 

FGDNIVRX.5TAffTYSYQKVD114 
TTC GGG GAC AAC ATT GTT OGC TTG AGT ACA GCC AAC ACC TAG TCC TAC CAQ AAA GTG GAC 363 

LPP0BYVB0LLQPQDPASLG134 
CTG CCC TTC CAG GAA TAT GTG GAA CAG CTG CTG CAG CCC CAG GAT CCT GCA TCC CTA GGC 423 

NDTI«YFPG0filllFTEWASLPQlS4 
AAT GAC ACC CTC TAC TTT TTT GGA GAC AAC AAC TTC ACT GAG TCG GCA TCC CTC TTC CAG 483 

HYSPPPFRI'LGTTPAVSFGI 174 
CAC TAC TCT CCG CCA CCA TTC CCT CTC CTG GGA ACC ACC CCT GCT TAC ACC TTT GGA ATT 543 

AGAGSGVPFHWUGPGP5EVI194 
GCA GGA CCT GGA TCT GGG GTA CCC TTC CAC TCG CAT GGG CCT CCT TTC TCA GAG GTT ATC 603 

yGRKRllFLYPPEKTPEFHPN214 
TAT GGT CGG AAG CGC TCG TTC CTC TAC CCT CCT GAG AAG ACA CCT GAG TTC CAC CCT AAC 663 

KTTLAWLLEIYPSLALSARP234 
AAC ACC ACA TTG GCC TGG CTG CTC CAA ATA TAC CCA TCT CTA GCC CTG TCA CCA CGG CCT 723 

LECTIOAGBVLYFPDRWWHA2S4 
CTA GAA TGT ACC ATC CAC CCT CCT GAA CTA CTG TAT TTT CCT CAT CCC TCC TCC CAT CCC 783 

TLNLDTSVFISTFLG* 270 
ACA CTC AAT CTG GAC ACC ACT CTC TTC ATT TCT ACC TTC CTT GGC TAG 831 

CO^GACACCCAACTCCCAACCCCACrGCACCACCACATCCCAATCTAGTCCTCACAGACTTrA 910 

GCACCACCAACCTCAGCCCACCCTCACCCACTCTCCACCCCAGAAGCOCCACAACGGACGCTCATG 989 

TATCCTGAGAACGCCACCACTTCAGAACCCATCACCAGCCCCCATCCOCCCAGCCCCACGCACACAAACT^^^ 1068 

CTGGAGCrrCarrCTCCAGATCCTCCTCGCCCACGCTCCCACCCACGACATGGGC 1147 

TTCTCAGACATCAAACCCTCAATCA C ' n ' CC T TCA TGCCCAACTTCGCCATCAC C T G rrC CTGCCTCAAACGCCTCCCGG 1226 

TCACACCCTCAAAGTCCCCCACACCCTGCAACAGACTCAACAG-lVnCAATCCCCTGA^ 1305 

CX:CTCTCCATCCCCCCCTCTCCATCCCCCCn VC I"l ACCTCCACGTCCrCCTC ^ ; 1 OLGG 'l'CATAGGTGATACC 1384 

AJTCCCTCTAATCCAGGGTTCCCCCATCACCTCAAACCTAATCTTGCCACACAACTACTCACCCATATCTCC C i IL I 'AT 1463 
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AGOlCftAGGOQAAAATGTCTJ 




iTACCKa^CCAGCCGMGt^^ X543 



TCCTCA LVm ' LTm ' CTa^TC CaCCIGftGAIjAAGACC^ 1621 

CaTGTGTCmilAACTCCrcnrTCC^ 1700 

acato::accaaaggctggggcacttttcatgcca^ 1779 

CTCACUlXiLTl G GCCTCAATGCAGGCCTtX^^ 1858 

ACrCCTCCAOTTCCCTGACGGTTAACCAGAAGCTACnTO 1937 




•CCTTCAATAAAAACACrixn^CTGGTGACTCAGTGT 2016 



CTGCTGOCCGACCXywrCXAi 




2095 



GCGGCOG 
2102 
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CACOCOTCCGGCTGCXXXSAGCACGAGGATCGGCGA^ ATG GAT AAC CGT TTT GCT 



70 



TAFVIACVtSLlSTIYMAAS 26 
ACA GCA TTT GTA ATT GCT TOT GTC CTT AGC CTC ATT TCC ACC ATC TAC ATG GCA GCC TCC 130 

IGTDFWYBYRSPVOEMSSDL 46* 
ATT GGC ACA GAC TTC TGG TAT GAA TAT CGA ACT CCA GTT CAA GAA AAT TCC ACT GAT TTG 190 

NKSIWDEFISDEADEKTYND 66 
AAT AAA AGC ATC TGG GAT GAA TTC ATT ACT GAT GAG GCA GAT- GAA AAO ACT TAT AAT GAT 250 

ALFRY KGTVGLWRRCITIPK 86 
GCA CTT TTT CCA TAC AAT GGC ACAGTGGGATTaTGGAGACGGTGTATC ACC ATA CCC AAA 310 

MMHWYSP PERTESFOVVTKC106 
AAC ATG CAT TGG TAT ACC CCA GCA GAA AGG ACA GAG TCA TTT GAT GTG CTC ACA AAA TGT 370 

VSPTLTE0FMEKFVDPGNHN136 
GTG ACT TTC ACA CTA ACT GAG CAG TTC ATG GAG AAA TTT GTT GAT CCC GGA AAC CAC AAT 430 

SGtDLLRTYLHRC0PC*tPPV146 
ACC GGG ATT GAT CTC CTT AGG ACC TAT CTT TGG OCT TCC CAG TTC CTT TTA CCT TTT GTG 490 

SLGLMCFGALtGLCACICRS166 
ACT TTA CGT TTC ATG TGC TTT GGG GCT TTG ATC GGA CTT TGT CCT TGC ATT TCC CGA ACC SSO 

LYPTIATGtLHLLAGNYSDS186 
TTA TAT CCC ACC ATT GCC ACQ GGC ATT CTC CAT CTC CTT GCA GGA AAT TAC TCA GAT TCT 610 

W L H E • 191 

TGG CTC CAT GAA TAA 625 

TTTTAATGATC'l' i C r A CATTATCCTTGATAATTACTC An '' rCi ' C AATAAT Ci''i ' i iA ATTTCATCCCATGACTCTGAGGA 704 
TACCTTCCAACCTCTTTAA A 'l't a ^CCirACAAACTCATTCCCAA Gri ' C^^ 783 
CCACTCCCCCATCCCTATCGTACTTTAAAAACATCCCCTTAAAATCCTTCCATCAATCTO 862 
CTTGAATCTACC C T GGC TT G T CAl ' GGll r r OA CCAATAGACTCItXrTCAAATGACACTCTT CrCA TCACCTCCTAAAG 941 
ATCAlX i 'rOTCCr r A AACCA G - riVrC T'r C GAACACTCA Ct 1 L i 1 A CAACATTCCCT C TCCAAACCCACATACCATGCTCTG 1020 
AACTCCACCCCACATCCACCTCTCCTCTCTACATCCrCCACCTCAAATCa 1099 
TCATTrCCACCCATGTGTOCCACCCATCCTCCATCTCCACCCTrAACAACCCrrOAOAGC^ 1178 
TCTrACTACATCCrrCTCAGACrCTAATAAACAACCAACTACCTCAGCCCAATCAACCTATCGAACTC 1 2 S 7 

ATGAATTCTTCrrTTGTCCCGCTAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1308 
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M D N R 4 

AATTCGG^1WC^IXlCKGVVGC^/VGCCGGTGGACr^GAG^^^ ATG GAT AAC CGT 7S 

FATAFVIACVLSLISTIYMA 24 
TTT GCT ACT GCG TTT GTG ATT CCT TGT GTG CTT ACT CTG ATT TCC ACC ATC TAC ATG GCG 135 

ASIGTOFWYfiYRSPIQSNSS 44 
GCC TCC ATA OCC AOG GAC TTC TOG TAT GAG TAT CGA AGT CCC ATT CRA GAG AAT TCA ACT 195 

OSNKIAWEDFLGOEADEKTY 64 
GAC TCG AAT AAA ATC GCC TGG GAA GAT TTC CTC GGT GAC GAC GCG GAT GAG AAG ACT TAC 255 

NOVLFRYNGSI.GLWRRCITZ 84 
AAC GAT GTT CTG TTC CGA TAC AAC GCC AGC TTG GGC CTG TGG AGA CGG TGC ATC ACC ATA 315 

PKNTHWYAPPERTESFDVVT104 
CCC AAA AAC ACT CAC TGG TAT GCG CCA CCG GAA AGG ACA GAG TCA TTT GAT CTG GTT ACC 375 

KCN5FTLllEQPMBKYVDPGKri24 
AAA TGC ATG AGT TTC ACA CTA AAC GAG GAG TTC ATG GAG AAG TAT GTG GAC CCC GGC AAC 435 

HNSGXOLLRTYLWRCOPLLP144 
CAC AAT AGC GGC ATC GAC CTC CTT CGC ACC TAC CTG TOO CGC TGC CAC TTC CTT TTA CCC 495 

FVSLGLMCPGALrCLCACIC164 
TTC GTC AGC TTG GGC TTG ATG TGC TTT GGG GCG TTG ATT GGC CTC TOT GCC TGT ATC TCC 555 

RSLYPTLATGILHLLAGLCT184 
CCC AGC CTC TAT CCC ACC CTC OCC ACT GGC ATT CTC CAT CTC CTT GCA CGT CTG TCC ACA 615 

CGSVSCYVAGieLLHQKVEL204 
CTO CCC TCC CTG ACT TGC TAT GTT GCC GGC ATT GAA CTC TTA CAT CAG AAA GTA GAG CTG 675 

PKDVSGEFGWSFCLACVSAP224 
CCC AAG GAT CTA TCT GCA GAA TTT CCA TOG TCC TTC TCC CTG GCC TGC GTC TCG GCT CCC 735 

L0FMAAALFIWAAHTNRKBY244 
TTA CAC TTC ATC GCC CCC CCT CTC TTC ATC TCC GCT CCC CAC ACC AAC CCC AAA GAG TAC 795 

TLMKAYRVA* 254 
ACC TTA ATG AAC CCT TAT CGT GTG CCA TGA B25 

ACCCAGC CnJCCnUC a TA ATGATTAATATTTrTCATAC AnT rT t - r 871 
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HDH3UI tAHCO 215 

Z^at £iltt tag215i Outimt Fil« t«g215.pab S«qa«ic« Imagth, 2747 

MBLGCHTOLG 10 
TCCC(:»GTAGACGCTCaXX::ACXAGCOT ATG GAG CTG GGT tGC tOG ACQ CAG TTG GGG 66 

LTFLQLZiLISSLPREYTVItf 30 
CTC ACT TTT CTT CAG CTC CTT CTC ATC TCG TCC TTG CCA AGA GAG TAG ACA GTC ATT AAT 126 

EACPGAEWNIMCRECCEYOO SO 
GAA GCC TGC OCT GGA OCA GAG TGG AAT ATC ATG TGT CGG GAG TCC TGT GAA TAT GAT CAG 186 

lECVCPGKRBVVGYTIPCCR 70 
ATT GAG TGC GTC TGC CCC GGA AAG AGG GAA GTC GTG GGT TAT ACC ATC CCT TGC TGC AGG 246 

NSBNECDSCLIHPGCTIFEN 90 
AAT GAG GAG AAT GAG TGT GAC TCC TGC CTG ATC CAC CCA GGT TGT ACC ATC TTT GAA AAC 306 

CKSCRNGStfOGTLDDFYVKG 110 
TGC AAG AGC TGCCGAAATGGCTCATGGGGGGGTACCTTGGATGACTTCTATGTQAAG GGG 366 

PYCAECRAGWyGGDCMRCGO 130 
TTC TAC TGT GCA GAG TGC CGA GGA GCC TOG TAC GGA GGA OAC TCC ATO COA TOT GGC CAG 426 

VLRAPKGQILLESYPLNAHC ISO 
GTT CTG CGA GCC CCA AAG GGT CAG ATT TTG TTG GAA AGC TAT CCC CTA AAT GCT CAC TGT 486 

BNTIHAKPGPVIQLRFVMLS 170 
GAA TGG ACC ATT CAT GCT AAA CCT GGG TTT GTC ATC CAA CTA AGA TTT GTC ATG TTC AGC 546 

LBFDYMCOYOYVSVROGDITR 190 
CTG GAG TTT GAC TAC ATC TGC CAG TAT GAC TAT CTT GAG GTT CGT GAT GGA GAC AAC CGC 606 

DGQIIKRVCGNERPAPIQ5I 210 
CAT GGC CAG ATC ATC AAG CGT GTC TGT GCC AAC GAG CGG CCA GCT CCT ATC CAG AGC ATA 666 

GSSLHVLFHSDGSKKFDGFH 230 
GGA to: TCA CTC CAC CTC CTC TTC CAC TCC GAT GCC TCC AAG AAT TTT CAC GGT TTC C^^ 726 

AIYEBITACSSSPCFKDGTC 250 
GCC ATT TAT GAG GAC ATC ACA GCA TGC TCC TCA TCC CCT TGT TTC CAT CAC CCC ACG TGC 786 

VLDKAGSYK CACLAGYTGOR 270 
GTC CTT GAC AAC GCT CGA TCT TAC AAG TGT GCC TCC TTC CCA GGC TAT ACT GGG CAG CCC 846 

CBHLLSERNCSDPGGPINGY 290 
TGT GAA AAT CTC CTT GAA GAA AGA AAC TGC TCA GAC CCT GGG GGC CCC ATC AAT GGG TAC 906 

QKITGGPGLIMCRHAKICTV 310 
CAC AAA ATA ACA GGG GCC CGT CGG CTT ATC AAC CGA CCC CAT GCT AAA ATT CCC ACC CTT 966 

VSFFCYNSYVLSGKEKRTCO 330 
CTC TCT TTC TTT TGT TAC AAC TCC TAT GTT CTT ACT CGC AAT GAG AAA AGA ACT TGC CAC 1026 

ONGEWSGKQPICXKACREPK 350 
CAGAATGGACAGTGGTCAGGGAAACAGCCCATCTUCATAAAAGCCTGCCGAGAACCA AAG 1086 

ISD:.VRRRVLPM0V0SRETP 370 
ATT TCA CAC CTG GTC ACA ACC ACA CTT CTT CCC ATG CAC CTT CAG TCA ACG CAG ACA CCA 1146 
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LHQLYSAAFSKOKLQSAPTK 390 
TTA CAC CM CTA TAC TCA GCG CCC TTC AGC AAG CAC AAA CTG a«; ^ 1206 

KPALPFGDLPMCyQHLHTQI. 410 
AAG CCA GCC CTT CCC TTT GGA GAT CTG CCC ATG GGA TAG CAA CAT CTQ CAT ACC CAG CTC 1266 

QVBCISPFYRRLGSSRRTCL 430 
CaO TAT GAG TGC ATC TCA CCC TTC TAC CCC CGC CTG GGC AGC AGC AGO ACG ACA IGT CCG 1326 

RTGKWSGRAPSCIPICGKIB 450 
AGG ACT GGO AAG TGG ACT GGO CGG GCA CCA TCC TGC ATC CCT ATC TGC GGG AAA ATT GAG 1366 

MITAPKTQGLRWPHQAAITR '470 
AAC ATC ACT GCT CCA AAG ACC CAA GGG TTO CGC TGG CCG TGG CAG GCA GCC ATC TAC AGO 1446 

RTSGVHDGSLHKGAWFLVCS 490 
AGG ACC AGC GGG GTG CAT GAC GCC AGC CTA CAC AAG GGA GCG TGG TTC CTA GTC TGC AGC 1S06 

CALVNERTVVVAAHCVTDLG 510 
GGT GCC CTG GTG AAT GAG CGC ACT GTG GTG GTG GCT GCC CAC TGT GTT ACT GAC CTC GGG 1566 

KVTMIKTADLKVVLGXFYRO 530 
AAG GTC ACC ATG ATC AAG ACA GCA GAC CTG AAA GTT GTT TTG GGG AAA TTC TAC OGG GAT 1626 

DDRDEKTIQSLOISAIZLHP 550 
GAT GAC CGG GAT GAG AAG ACC ATC CAG AGC CTA CAG ATT TCT GCT ATC ATT CTG CAT CCC 1686 

tTYOPILLOADIAlLKLLOKA 570 
AAC TAT GAC CCC ATC CTC CTT GAT GCT GAC ATC GCC ATC CTC AAG CTC CTA GAC AAG GCC 1746 

RISTRVQPI CLAASROLSTS 590 
CCT ATC AGC ACC GGA GTC CAG CCC ATC TGC CTC GCT CCC ACT CGG GAT CTC AGC ACT TCC 1806 

PQBSHITVAGWHVLADVRSP 610 
TTC CAC GAG TCC CAC ATC ACT GTG GCT GGC TGG AAT GTC CTC GCA GAC GTG AGG ACC CCT 1866 

GPKNDTLRSGVVSVVOSLLC 630 
GGC TTC AAG AAC GAC ACA CTC CCC TCT GGG GTG CTC ACT GTG GTG GAC TOO CTC CTG TGT 1926 

EBOHEDHGIPVSVTDMMFCA 6S0 
GAG CAG CAC CAT GAG GAC CAT CCC ATC CCA GTG ACT CTC ACT CAT AAC ATC TTC TCT CCC 1986 

SWSPTAPSDICTABTGGIAA 670 
ACC TOG CAA CCC ACT GCC CCT TCT GAT ATC TCC ACT GCA GAG ACA GCA CCC ATC GCG GCT 2046 

VSFPGRASPEPRWHLMGLV5 690 
CTG TCC TTC CCC GGA CCA GCA TCT CCT GAG CCA CGC TOG CAT CTC ATC GGA CTC CTC ACC 2106 

WSYOKTCSHRLSTAFTKVLP 710 
TCG AGC TAT CAT AAA ACA TGC AGC CAC ACG CTC TCC ACT CCC TTC ACC AAC GTC CTC CCT 2166 

FKDHIERNMK* 721 
TTT AAA CAC TCG ATT CAA ACA AAT ATC AAA TCA 2199 

ACCATCCTCATCCACTCCTTCACAAO'l U i i I C rG T A TATC C CT C TCTACCTCTGTCATTG C GTGAANCACTCTGCGCCT 2278 

CAACTGTCATTrCCCCTCTCAACTTCCCTCTCCCACCCC MCI U\CTTCACCCACAAAACTCACTGAACGCTGACTACA 2357 

CCTCCATTGCTCGTAGGCTGATCCCVCCTCCACTACTACGACAGCCAArrcCAAGATCCCACGGCTTGCAAG^ 24 36 
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rnvntaA AGAAGACCAT 



nTAOGAAT 2S15 



CCCCATCTCTXtJTACACATTITAATAAAATAAGCKmXSG^^ 2747 
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IVCCTCCGC 

MGGPRGAGWVAA 12 

CCGGOGCTOCGGCTCTCCCCCGCCXXX»CC ATG GGT GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG 145 

GLLLGAGACYCIYRLTRGRR 32 

GGC CTG CTG CTC GGC GCG GGC CCC TGC TAG TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG 205 

RQOR&LGIRSSKSAGALEEG 52 

CGG GGC GAC CGG GAG CTC GGG ATA CCC TCT TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG 265 

TSEGQLCGRSARPQTGGTWE 72 

AOG TCA GAC GGT CAG TTG TGC GGG CGC TCG GCC CGG CCT CAG ACG GGA GGT ACC TGG GAG 325 

SQWSKTSOPEDLTDGSYODV 92 

TCA CAG TOG TCC AAG ACC TCG CAG CCT GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT 385 

LNAEOLOKLLYLLBSTEOP V 112 

CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA 445 

riERALXTLGNNAAFSVMQA 132 

ATT ATT CAA ACA CCT TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT 505 

riRELGGIPIVANKINHSMQ 152 

ATT ATT CCT GAA TTG GGT GGT ATT CCA ATT OTT GCA AAC AAA ATC AAC CAT TCC AAC GAG 565 

SIKBKALHALNKLSVNVENQ 172 

ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT GTG AAT GTT GAA AAT CAA 625 

rKIKiyiSOVCBDVFSGPLN 192 

ATC AAG ATA AAC ATA TAC ATC ACT CAA GTA TCT GAG GAT CTC TTC TCT GGT CCT CTG AAC 685 

SAVQLAGLTLLTNMTVTMDH 212 

TCT CCT CTG CAG CTG GCT GGA CTC ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC CAC 745 

OHMLHSYITDLFOVLLTOMG 232 

CAC CAC ATG CTT CAC ACT TAC ATT ACA CAC CTC TTC CAC CTC TTA CTT ACT CCA AAT GGA 805 

MTKVQVLKLI.LMLSeNPAMT 252 

AAC ACG AAG CTC CAA CTT TTG AAA CTC CTT TTC AAT TTG TCT GAA AAT CCA CCC ATG ACA 865 

EGLLRAOVDSSPLSLYOSHV 272 

CAA GCA CTT CTC CCT GCC CAA CTC CAT TCA TCA TTC CTT TCC CTT TAT CAC ACC CAC CTA 925 

AKBILLRVLTLFONIKNCLK 292 

CCA AAC CAG ATT CTT CTT CCA GTA CTT ACC CTA TTT CAC AAT ATA AAG AAC TCC CTC AAA 985 

lEGHLAVQPTFTECSLFFLL 312 
ATA GAA CCC CAT TTA CCT CTC CAC CCT ACT TTC ACT GAA CCT TCA TTC TTT TTC CTC TTA 1045 

MGEECAQKIRALVDHHOAEV 332 
CAT GCA GAA CAA TCT CCC CAC AAA ATA ACA CCT TTA CTT CAT CAC CAT CAT CCA CAC CTG X105 



K E X V V T I I P K I • 
AAC CAA AAC CTT CTA ACA ATA ATA CCC AAA ATC TCA 



344 
1141 



TTCCTCATAT 



:CAAACAGTAATCCACTCTCGATATAAATCTArrrrCTCTCTTCC7TATAACCCCATTCrCCCAC 1220 
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CTGCTAAATTTAAAa«rrAAATATCAC A?mit; ' lCT TTJ^ 1299 

ACT AlT r rG ATGCCAACTGAATATAAGAgCTIXjTACTCAAAC^ 1378 
GTTATCTTCCCTACATGAAGTGGCaGTAACCTn riCACATTTAAGCTACCCTTCTACXTTTTGAAGTGAI 1 IIjCAGTT 1457 

ACTCATCTCAGACAGCATCAGTArrTGACTAAATCATT^^ 1S3 

ATCCTAAGCnCTTGAGGCCATTCACCTGCCAACCTGACCATAC^ 1615 

TTTCGTCACTTCTAGTaUTGAAAAATGTAAAC^^ 1694 

TACATATAAAATAGTGTOATaUVTCACAATGTCCATCTT^^ 1773 

ccGTGcroGGCGCUiitxKrirnticcigf^ issi 

GTTTGAGACCAAGCCTCACCAATATGGAGAAACCCTOr^^ 1931 

GCCTGTAATCCCAGCTACTTGGGAGGCCGAOGCACXS^^ 2010 

ATACCGCCATTGCACTCCAGCCTCOGCAAauaAGC^ 2089 

TGTOCTTAAOTGGAAAflATATCTATGAAATATG GiCilil ill n ' A AAACACAAAAATTATAGAATATGGG A TC C OGTGTG 2168 

TCTCTGTtrrcT GTOr o T GT a T m vii gn^^ 2247 

CTAaAATGATACCCAAACTCCTOGAGTtXXSACttX^ 2326 

AATATGACCCCAAATTCTATA AiC ' HT 1 i i iA ATAAAOCXS(»QAAAAATCAAAAAAAAAAAAAAAAACOGCGGCOGC 2403 
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ise 



M G G A R 5 

GCTCCA LXrfGCG T G TG G CCTTgOCTCLn'OGGC T CCXrT^^ ATG GOT GGC GOO OGC 229 

DVGWVAAGLVLCAGACYCIY 25 

GAC GTG GGC TGG GTG GCA GCA GGG CTC GTC CTC GGC GCC GGC GCC TGC TAC TGT ATC TAG 289 

RLTRGPRRGVATMRPSRSAB 45 

CGG CTG ACT CGG, GGA CCG CGG CGA GGC GTC GCG ACC ATG COC CCT TCG CGA TCC GCA GAA 349 

DLTDGSYOOILNABQLKKIiL SS 

GAC CTA ACC GAT GGC TCC TAT GAC CAT ATC TTA AAT CCA GAG GAG CTT AAG AAA CTT CTO 409 

YLLESTDDPVITEK ALVTLG 8S 

TAT CTG CTG GAG TCA ACC GAC GAT CCT GTC ATT ACT GAA AAG GCC TTG GTC ACC TTC GGA 469 

NMAAFSTNQAIZRELGGIPI 105 

AAT AAT GCA GCC TTC TCC ACT AAC GAG GCC ATT ATT CGT GAG TTG OCT GOT ATC CCA ATT 529 

VGNKIKSLNQSIKBKALNALI25 

CTT GGA AAC AAA ATC AAC TCC CTG AAC CAA AGT ATT AAA GAG AAA GCT TTA AAT OCA CTG 589 



JI MLSVNVB M OTKI K I YVPQV 
AAT AAC CTC ACT GTC AAT CTT GAA AAT CAA ACT AAG ATA AAG ATA TAC GTC CCT CAA GTC 

C E D V F A D 
TGT GAG GAC GTC TTT CCT GAC 



145 
649 

152 
670 
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10 20 30 40 SO 

HOMilM MALtSRPALT LLLLIJlAAVVRCQEQAQTTDWRATLKTIiWGVHKIDr/U^^ 

I . .Ill ■ »••« -.,,,,,»,«-.-••♦••-••»•••■•••••••• 

MuRlKIS M'VTPRPAPARGPALLLLLLIATARGQEQDQTTDWfUVTLKTrRNGlKKIDTYLNAALDLL 
10 20 30 40 50 

60 70 80 90 100 110 

GGEDGLCQYKCSDGSKPFPRYG7KPSPPNGCGSPLFGVHLNIGIPSLTKCCNQHDRCYET 

GGEDGLCQYRCSDGSKPVPRYGYKPSPPNGCGSPLFGVHLNIGXPSLTKCCNQHDRCYST 
60 70 80 90 100 110 

120 130 140 150 160 170 

CGKSK:^^X:DEEFQYCLSKICRDVQKTLGLTQHVQACETTVELLFDSVXHLGCKPYLDSQR 

CGKSKNDCDEEFQYCLSKICRDVQKTLGLSQNVQACETTVELLFDSVIHLGCKPYLDSQR 
120 130 140 150 160 170 

180 190 
: AACRCHYEEKTDL 



AACWCRYEEKTDL 
180 190 
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10 20 30 40 50 60 

Hc/Ai P€ MAQU3AWAVASSFFCASLFSA\mKIEEGHIGVVYRGGALLTSTSGPGFHUMLPFITS^ 

MAQLCAVVAVASSFFCASLFSAVHKISEGHIGVr/RGGAIXTSTSGPGFHLMLPFlTSYK 
10 20 30 40 SO 60 

70 80 90 100 110 120 

SVOTTLQTDEVKi^ryPCGTSGGVMlYFDRIEVVWLVPNAVYDIVrar/TADYDKALI 

SVQTTLQTDEVK^IVPCGTSGGVMIYFDRIEVVNFLVPNAVYDIVK>rrrADYD 

70 SO 90 100 XIO 120 

130 140 150 160 170 180 

HHSlJ^QFCSVHTLQEVYIELFDQrDENLKIJVLQQOLTSMAPGLVIQAVRVTKPNrPEAIR 

HHELNQFCSVHTLQEVYIELFTOIDENLKIJUJQaDLTSMAPGLVIQAVRVTKPNIPEAIR 
130 140 150 160 170 180 

190 200 210 220 230 240 

RNYEUMESEKTKLLIAAQKQKVVEKEABTERKKALIEAEKVAQVAEITYGQICVMEKETEK 

RNYELMESEKTKLLIAAQKQKWBKEAETERKKALIEAEICVAQVAEITYGQKVMEKETBK 
190 200 210 220 230 240 
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10 20 30 40 50 60 

f^U^ AU MNMTQARVLVAAVVGLVAVLLYASIHKIEEGHLAVYYRGGALLTSPSGPGYHIHLPFITT 

Mt^Riue — • 



70 80 90 100 110 120 

FRSVQTTLQTDEVKNVPCGTSGGVMIYIDRIEVVNMI^PYAVFDIVR^^ 

KNVPCGTSGGVMIYXDRXEVVNMIAPYAVFDIVIWYTADYDKTLira 

10 20 30 40 

130 140 150 160 170 130 

KIHHEI^QFCSAHTLQEVYIELFOQrDENLKQAtQKDLNimPGLTIQAVRVTKP 

KIHHSLNQFCSAHTLQEVYIELFDQrDENLKQALQKDLhlTHAPGLTIQAVRVTKPKIPEA 
SO 60 70 80 90 100 

190 200 210 220 230 240 

IRRNFSUlEA£KTiaLXAXQRQKVVEKEA£TERKKAVIEA£KIAQVAKXRFQQK^ 

IRRNFELMEAEKTRIXXAAQKQKVVEKEAETERKRAVrEAEKIAQVAKXRFQQK^ 
110 120 130 140 150 160 

250 260 270 280 290 300 

EKRXSEIBDAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKIYFG 

EKRISEXEOAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKXYFG 
170 180 190 200 210 220 

310 320 330 340 

SNIPNMFVDSSCALKYSDrRTCRESSLPSKEALEPSGENVIQNKESTC- 

SNXPSMFV0SSCALKYS0GRTGRE0SLPPEEAREPSGE5PXQNKENAGN 
230 240 250 260 270 
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10 20 30 40 50 60 

Muei K>€ MKLLCXVAVVGCI^VPPAQANKSSEDIRCKCrCPPYI^ 

h»c;M4M MKLI^LVAVVGCLLVPPAEANKSSEOIRCKCICPPYIUflSGHIYNQNVSQKDra 

10 20 30 40 SO 60 

70 80 90 100 110 120 

PMPVPGHDVEAYCIXCECRYEERSTTTIKVIIVIYLSVVGALLLYMAFLMLVDPL^^ 

PMPVPGHDVEAyCLtCECRYEERSTTTIKVIIVIYLSVVGAJULLY 

70 80 90 100 110 120 

130 140 ISO 160 170 180 

AVTEQLHNEEENEDARTMATAAASrGGPRANT^rt.ERVEGA(X}RWKLQVQEQRKT^ 

AyTEQLHNEEENEDARSHAAAAASU»3PRANTVI.ERVEGA(^RWKLQVQEQRKTV^ 

130 140 150 160 170 180 



MLS 
MLS 



F«C, 2-5 
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10 20 30 40 50 

HC/M A N MATLW-GGLLRLGSLLSLSCLALS VLLLAQLSDAAKNFEDVRCKCICPPYKENSGHIYNK 

M^Rl KiC KASLWCGNLI^GSGLSMSCLALSVLLLAQLTGAAKNFEDVRCKCICPPYKEMPGftt 

10 20 30 40 50 60 

60 70 80 90 100 110 

MISQiaxrDCLHVVEPMPVRGPDVEAYCLRCECKY£ERSSVTIKVTIIIYLSIMI.IJ^ 

NISQKIXriXriilVVEPMPVRGPDVEAyCLRCECKYEERSSVTIiCV^ 

70 80 90 100 110 120 

120 130 140 ISO 160 170 

VYLTLVEPZLKRRLFGHAQLIOSDDOXGOHQPFA^1AHDVIJU^RS!U^^1^^ 
■■*••*■*■■*•••*•■•••■•••••••*«••«•*•******•' 

VVLTLVEPItKRRI.FGHSQLLQSDDDVGDHQPFANAHDVLARSRSRANVLI^ 

130 140 ISO 160 170 180 

180 190 

KLQVQEQRXSVFDRHWLS 



KLQVQEQRKSVFDRHWLS 
190 
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10 20 30 40 50 60 

MIRCGIACERCRWILPIJilXSAIAFDZIAIJ^RGWLQSSDHGOTSSLWWKCSQEGGK^ 
••*z«:::****«i:7i5>s«>*«>{******«*>>a«»>a ••••••••• 

MUlCGIJlCERaiWILPLLLX.SAXAFDIIAlAGRGWLQSSl^^ 

10 20 30 40 50 50 

70 80 90 100 110 120 

YEEGCQSLMEYAWGRAAAAMLFCOFIILVICFriiSFFALCGPQMLVFIJtVIGmj,AIJ^ 

YDDGCQSIJ^EYAWGRAAAATLFCGFZILCICFILSFFALaSPQMLVFIJtVZGGZJ^^ 

70 80 90 100 110 120 

130 140 150 160 170 180 

FQIXSLVIYPVfCYTQTFTIJCANPAVTyiyilWAyGFGWAATIILIGCAFFFCCXPNYEDD^ 



FQIISLVIYPVKYTQTFRIJnJNPAVNYiyNWAYGFGWAATIILIGCSFFFCCLPNYED 

130 140 150 160 170 180 

190 

LGNAKPRYPYTSAN 



LGAAKPRYFYPPAN 
190 
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10 20 30 40 50 

MUiclKIS MAGXPGL-FILLVLIXVFWQVSPYTVPWKPTWPAyRLPVVtPQSTUILAKADFOAXAi^ 

I I ' m • • m •••• •,*••■«■•**««««*••••■■••••■••>■■•■■• 

HU K AN MAGIPGLLFLLr FLLCAVGQVSPySAPWKPTWPATRLPVVLPQSTLNL^ 

10 20 30 40 50 60 

60 70 80 90 100 110 

VSSSCG PQCHKGTPLPTYSEAKQYLSVETLyANGSRTETRVGIYrLSNGEGRARGRDSEA 

VSSSCGPQCHKGTPLPTYEEAKQYLSYEX^YANGSRTETQVGIYILSSSGDGAQKHDSGS 
70 80 90 100 110 120 

120 130 140 150 160 170 

TCRSRRKRQIYGYTCRFSIFGKDFLLNYPFSTSVKLSTGCTGTLVAEKHVLTAAHCIHTC 

SGKSRRKRQIYGYDSRFSIFGKDFLLNYPFSTSVKLSTGCTGTLVAEKHVLTAAHCIHTC 
130 140 150 160 170 180 

180 190 200 210 220 230 

KTYVKGTQKLRVGFLKPKYKDGAEGDNSSSSAMPDKMKFQWIRVimTHVPKGWIKOT 

KTYVKGTQKLRVCFLKPKFKDGGRGANDSTSAMPEQMKFQWIRVKRTHVPKGWIKGMAND 
190 200 210 220 230 240 

240 250 260 270 280 290 

IGMDYDYALLELKKPHKRQFMKIGVSPPAKQLPGGRIHFSGYDNDRPGNLVYRFCOVKDE 

IGMDYDYALLELKKPHKRKFMKIGVSPPAKQLPGGRIHFSGYDNDRPGNLVYRFCDVKDE 
250 260 270 280 290 300 

300 310 330 330 340 350 

TYDLLYQQCDAQPGASCSCVYVRMWKRPQQKWERKIICrFSCHQWVDMNGSPQOF>IVAVR 

TYDLLYQQCDAQPGASCSGVYVRMWKRQQOKWBRKIIGIFSCHQWVDMNGSPQDFNVAVR 
310 320 330 340 350 360 

360 370 380 

ITPLKYAQ ICYWIKCiNYLDCREC 



ITPLKYAQ ICYWr KGi^nXDCREG 
370 330 
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10 20 30 40 50 

HUMi* M MAPASR LLALWAUUVVALPGSGAEGDGGl^TRPGG PC A VAEESRCTVERRADLT 

Mt^l^N e MAAiW3RRGLLLLFVLWMMVWILPAS---GEGGWKQNGLGIAAAVMEEERCTVS^^ 
10 20 30 40 50 

60 70 80 90 100 110 

YAEFVQQYAPVRPVILCKSLTDNSRFFULCSWJRLLASrGDRVVRLSTANTYSYHKVOLPF 



YSEFMQHYAFLKPVILQGLTDNSKFRALCSRENLLASFGDNIVHLSTANTYSYQKVDLPF 
60 70 80 90 100 110 

120 130 140 150 160 170 

QEYVEQLLHPQDPTSLGNDTLYFFGDNNFTEWASLFRHVSPPPFGLLGTAPAYSFGIAGA 

■ ■■*■••« •••»•■•••••••■••*•••••• m m m 9 

• ••>•■■•■■■••••■■•>>■■■>■■■■■•■■■■■■•■■■•■•• 

QEYVEQLLQPQDPASLGNDTLYFFGDNNFTEWASLFQHYSPPPFRLLGTTPAYSFGZAGA 
120 130 140 150 160 170 

130 190 200 210 220 230 

GSGVPPHWHGPGYSEVIYGRKRWFLYPPeKTPEFHPNKTTLAWLROTYPALPPSARPLEC 

GSGVPFHWHGPGF5EVIYGRKRWFLYPPEKTPBFHPNRTTLAWLLEIYPSLALSARPLEC 
180 190 200 210 220 230 

240 250 260 

TIRAGEVLYFPDRWWHATLNLDTSVFISTFLG 



TXQAGEVLYFPDRWWHATLNLDTSVFISTFLG 
240 250 260 
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10 20 30 40 50 60 

MDtmFATAFVIACVLSLISTIYMAASIGTDFV/VEYRSPVQENSSDLNI^IWDEFISDEM 

MDNRFATAFVIACVLSLISTIYHAASIGTDFSr/SYHSPIQEriSSDSNXIAWED^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

EKTYNDALFRYNGTVGLWRRCITIPKNMHWYSPPERTESFDVVTKCVSFTLTEOFM^ 

EKTYNDVLFRYNGSLGLWRRCITIPKNTWVYAPPEHTESFDVVTKCMSFTLNEQFMEKYV 
70 60 90 100 110 120 

130 140 150 160 170 ISO 

DPGNHNSGIDLLRTYLWRCQFLLPFVSLGLMCFGALrGLCACICRSLYPTIATGILHLLA 

DPGNHNSGIDLLRTYLl^COFLLPFVSLGLMCFGALIGLCACICRSLYPTLATGILHLLA 
130 140 150 160 170 180 

190 200 210 220 230 240 

GLCTLGSVSCWAGtELLHQKLELPDNVSGEFCWSFCLACVSAPLQFKASALFIWAAHTN 

GLCTLGSVSCWACIELLHQfCVELPKDVSCEFGWSFCLACVSAPLQFHAAALFZWAAHTN 
190 200 210 220 230 240 

250 

RKEYTLMKAYRVA 



RKEYTLMKAVRVA 
250 
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10 20 30 40 SO 

MU Ei J06 MGGARDVGWVAAGLVLGAGACYCI YRLTRGPRRCVATM- -RPSRSAEDLTDGSYDDILNA 

HUKAM MGCPRGAGW^/AAGLLLGAGACYCIYRLTRGRFmCDREL^ 

10 20 30 40 50 60 

SO 70 80 90 100 110 

EQLKXLLYLLESTDDPVlTEKALVTLGNMAAFSTNQAIIRELGGIPIVGNiaNSLN^ 

EQLQKLLYLLESTEDPVIIERAI-ITLGNNAAFSVNQAIIRELGGIPIVANKrNHSNQSIK 
70 80 90 100 110 120 

120 130 140 ISO 
EKALiVALNNLSVNVENQTKIKIYVPQVCEDVFA 



EKAIJlAUINLSVNVENQrKIKIVISQVCSDWSGPLNSAVQLAGLTLLTNM^ 

130 140 ISO 160 170 180 



LHSYITDLFQVVLTGNCOTKVQXLKLrXNUVENPAMTEGLLRAQVDSSFLFLYDXHV^^ 
190 200 210 220 230 240 



XLLQYLRPSE 
250 
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hii a m tntaXign 

ALIGN calculates a global aligamant of two aaquences 

version l.OuPlease cite: Myars and Killer^ CASXOS (1989) 

> ButlSO 1570 aa V9. > fautlSO 

1203 aa scoring matrix: paaX20.nat# gap penalties i -12/»4 
55.0% identity/ Global alignment score: 2219 

10 20 30 40 50 

GTCGAOCCAOGCGTCCG GGCCGGGGTCCTGA- - - - -GCCGGAGCCGGAGCGCGCGCC 

: : • : : . : : : . :;:::: 
GTCGACCCACOCGTCCGCGTGGATATGGAGCnKKTrGCTGCCAAGTCCG^ 
10 20 30 40 SO 60 

60 70 80 90 

GCTGCCCAGC CC CGC - - CGCGCCC-GCCCCGCACAT 'GGTGACT 

CCTGCCTAGCGCGTCCTGGGGACTCTtnXXXX^ 

70 80 90 100 110 120 
100 110 120 130 

C - - -CGOGGCCCGC- - -GCCC-OCCCGGS-GCCCOCCGCTC CTCCTOCT 

: :::::::: ::;:::: 

CGTAGAGCCCGGCGCTGCGCGCATGGCCCTGCTCrCGCGOT 

130 140 ISO 160 170 180 

140 150 160 170 180 190 

CCTGCTGCTCGCCACTGCGCGCGGG- - -CACCAACAGGACCAGACCACCGACTGGAGGGC 

lit : : : . : : :::::.:::: : : : : t t i i 2 t t : : i t x i : . : ; 

CCTCXrrCATCGCCCCrrCTTGTCAGGTGCCAGGAGCACGCCC^ 
190 200 210 220 230 240 

200 210 220 230 240 2S0 

CACCCTGAAGACCATCCCGAACGGCCTTCATAAGATAGACACG^ 

250 260 270 280 290 300 

360 270 280 290 300 310 

GCACCrGCrGGGCCCGGACGACXXXSCTCTGCC^ 

CCArcTCCrCCCACCCGAGGACCXTrCTCTCC 

310 320 330 340 350 360 

3^0 330 340 350 360 370 

TOTCCACCCrATGCATATAAACCATCTCCACCAA A TCXXrrGT ^ 

TTTCCCACGTTATCCTTATAAACCXTCCCCACCGAATO^ 

370 380 390 400 410 420 

330 390 400 410 420 430 

CCTTCATCTCAACATACCTATCCCTTCCCTCACCAAGTGCTCC^ 

rGTTCATCrTAACATTGGTATCCCTTCCCTCACAAACTCT rGCAACCAACACGACAGCTG 
430 440 450 460 470 480 

440 450 460 470 490 490 

LTATGAaACCTCCCCCAAAACCAACAACCACTCTGACXJAGCAGTTCC^ 

rrATCAGACCTCTCCCAAAACO>AGAATCACTCTCATCAAaAA^^ 
4»0 500 510 520 530 540 



fi(k. 32. OofS) 
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SOO 510 520 530 540 550 

CAAGATCTGOUZAOACCrrSCAGM^^ 

;;;::;::ti t;t: : x . t : 3t;;:s.}};;: : :: s;:xts::::; 

CAAGATCTGCa3AGATCTACAGAAAACACTACC»CTAAC^^ 

550 560 570 580 590 600 

560 570 5B0 590 600 610 

GACAACGGTCGACCTCCTCTTTGACAGCGTCATCCATra^^ 

AACAACAGTCX5AG CTLY IX jm i i ACA G ' lX»n T A TACATTTAG^^ 

610 620 630 640 650 660 

620 630 640 650 660 

CAGCCAGCCGGCTGCATGCTGGTXnx:XjrTATGAA ACC 

: i s . 1 1 j i ::. s ::::::::::::: i . J s s : 

CAGCXIAACGAGCCGCATGCAOCnGTCATTATGAAGAAAAAACTGAT^ 
670 680 690 700 710 720 

670 680 690 700 710 720 

CTCACTGCTGC»GAGCAGGCGA£ZAATGCUVGGAT^ 

... ... ft ••••• aa ■■■■ • a « 

■ JJS«SST«(a«! •••• «•>•*••••••••••■ >••• • 

CCGACAGCTAGTCUl'CAGATGAAGATGGAAOAACATACCl^^ 
730 740 750 760 770 

730 740 750 760 770 780 

TAACAGCCTAA lUriU;cnAG TTTT G T G T C GATGGGTCATT^^ 

:t titt ttitttt i : z ::::::: . 

rrACAACATAAAACTGTCTTATTTTTGTG- • AAAGGATTATTTTCAGACCTTAAAATA* * 
780 790 800 810 820 830 



::::: t t*:t:.t:s M*t*:*t 

. - ATTTATAT CTTCATGTTAAAACCT- ----- -CAAAOCAAAAAAAGTGAGGG 

840 850 860 870 

850 660 870 880 890 900 

ACCATCCTTGCGATCXX^yVGCaAGCAGCACATCCAAGAGCATC 

ACATAC TGAOGCGAOGGCA* - -C GCTTGTCTrC 

880 890 900 

910 920 930 940 950 960 

U IVl tU* i ax :i r CC C CAAACTC GGA AGAAAAGCtTAAC C T ro I CAT 
;:.;::.::;:! I ; j i : . i , ; 

-TCA-CCTATCTTCCCCA CCATT-CCTC CCTTA CTT 

910 920 930 940 

970 980 990 1000 1010 1030 

ACriTCrrACTTAACAATAAAAATCAAACCAAATCTAAAATTCATT^ 



AC7A-TCC CAAATGT CTT 

950 

LOJO 1040 1050 1060 1070 1080 

ATTATTTrATTrTCAAATACACCCCAATCTTCCCTTAGAACTATTArr^ 



GACCAAT-ATC- - -AAAAACAACTGCTTGTTTAC- 

960 970 980 
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1090 1X00 1X10 1120 . 1X30 1140 

TCAGATGTAGATTTATACCrCCAAAAACTArrAATTCTCCAi i TTTATTATACATAATGT 



-CGGA-GAATTTTGAAAAGAGGAATA TATAACTCAATTTT 

990 1000 1010 1020 

1150 1160 1170 1180 1190 1200 

GTrtmrCTCTQAAOCCCACTAAGATAGGTATAAATATGTTACTC^^ 

— — -CAC- ------------------------- -AAC--CACATTTA 

X030 1040 

1210 1220 1230 1240 1250 1260 

CCAAATGTGCATCT C TT G T A CACnTGGAATCACGGTTCGTACT^ 

CCAAA AAAAGAGATCAAATATAAAATT 

1050 1060 

1279 1280 1290 1300 1310 1320 

CAGGACATCTGA G X G i T GGGATGTGCACAGAArrCAGAAGCCXAG C TT C CT G TCTC^^ 

iS;««a«*BB 

CATCATAATGT CTGTT- - -CAACAT- -TATCT 

1070 1080 1090 

1330 1340 1350 1360 1370 1300 

ACCC CT X A CAGTGA A TGTCCTT C CT C TCCTt K TOT a AGCT^ 



- TATTTO CAAAATOGGGAAATTATC 

IXOO XXXO 

1390 1400 1410 1420 1430 1440 

GGGCCAAGCCGAGCTCTGAATCAGTGCXiCrATCTCCTCCTGA 



A CTTACA ------- AGTATTTOTTTACT ---- - 

X120 1130 1140 

1450 1460 1470 1460 1490 1500 

ATCCC CXi ' m TCCATCTTCTATCCTCGAGTAGTGTTAAAAGT C T G ACAT^ 



- ATGAAAT-TTTAAATAC- - ACATTT 

I ISO 1160 

I$XO 1520 1530 1540 1550 1560 

COTCTTAATAAAACCTATTTACTTCTTGGTAAAAAAAAAAAAAAAAAAAAAAAAA 

ATCC- - — CTAC AAAAAAAAAAAAAAAAAAAAAAAGGCC 

1X70 XX80 XX90 

1570 
OOCCC- 

CiGCCGC 
IJOO 
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10 20 30 

f^MA^ TANGGATCGACCACGCGTYCGCCCACGCGT 

M.U1» ACGCGTCCCCGCy^CGCGTGGGCGCGGACTGATGCCCTCATCGAAGCGACTCXSCCCGG^ 
10 20 30 40 50 60 

40 50 60 70 80 

CCGGTCGCGTCCTGAGQCGTGTGACGGTTT--TC--TTGCTCGTGOGC^^ 

GAAGTAGGGTGCTGAGCGGTGTGCCGGTTTCTACGGTTGCACG^ 

70 80 90 100 110 120 

90 100 110 120 130 140 

GGAGCGCCTCCAGGGACAGCCTCGATAAAGGCTCACTGATGGCTCAGTTGG^ 

»••••••••••••••• *•« ••••••••«•***«•••«•■••• •••• 

•••■•>•••* «•■•••••••••••■• >••«•«••■«■■•••■*«■*•••««■•• 

GGAGCGCCTGGAGGGACAGCCTGGATACAGGTTCACTGATGCCTCAGTT^ 

130 140 150 160 170 180 

150 160 170 180 190 200 

TGGCTGTCGCTTCCAGTTTCTTTT G T C CATCTCTC T TC 

•■■•^•••••••••••■•■••••••■••••••••••••••••••••••«»,,».. 

***•■«•••••■••••«•*•*•••••••••••■»•>■•>*«•*•••••■■>•*•( 

TtXXrCCTCCCrTCCACTTTCTTTT G TGCA^^ 

190 200 210 220 230 240 

210 220 230 240 250 260 

AGGCACATATTCCCGTATATTACACACGCCGTCCCCTCCTGACTTCCArc^ 
•>■••**■•••■• ••■••••■«••••• ■■•»■ • 

•••••••••••••••••••••••••••• •••r«a>>*»a*a« ••••• • 

AGCGACATATTCCAGTATATTACACACCTGGTCCCCrCCTGACXTCCA 
250 260 270 280 290 300 

2^0 280 290 300 310 320 

GTTTCCATCTCATGCTCCCTTTCATCACATCATATAACTCTGTGCACACCACACrCCAC^ 

GrrrCCATCTCATGCTCCCCTTCATCACATCCTATAACTCTCTAC^^^ 
310 320 330 340 350 360 

330 340 350 350 370 3^0 

CACATGACCTGAAGAATCTACCrTCTCCCACTACTGCTCCTCTCATC^ 
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CTGATGAACTGAAGAACGTACCATGTGGAACCAGTGGTOTTGTGATGA 
370 380 390 400 410 420 

390 400 410 420 430 440 

GAATTGAAGTGGTGAACTTCCTOrrCCCGAACiXAGTGTAT^ 

GAATTGAAGTGGTGAACTTCCTGGTCCCAAATGCAGTGTATGATATAGTGAAGAACTATA 
430 440 4S0 460 470 480 

450 460 470 480 490 500 

CTGCTGACTATGACAAGGCCCTCATCTTCAACAAGATCCACCACGAAC^^ 

CTtXrAGACTATGACAAGGCCCTCATCTTCAACAAGATCCATCATGAG^^ 
490 500 510 520 530 540 

510 520 530 540 550 560 

GCAGTGTGCACACGCTTCAAGAGGTCTACATTGAGCTGTTTGATC^ 

*s "5 J! ijiJ*ji«riir2i«iiiiiiiri2* z 

GCAGCGTTCATACTCTTCAGGAAGTCTATATCGAGCTGTTTC 
550 560 570 580 590 600 

570 580 590 600 610 620 

TCAAACTGGCTTTGCAACAGGACCTGACCTCCATGGCCCCIXMG^^ 

2«a«* *••*.•«•••*■■>••■>•••• *••••••••••••»>••••• *« 

TCAAGTTGGCTTTGCAGCAGGACCTGACTTCCATGGCCCC^^ 
610 620 630 640 650 660 

630 640 650 660 670 680 

TGCCGGTAACAAAGCCCAACATACCAGAGCCAATCCCCAGAAACTACGAGTTOATGGAAA 

TGCGAGTGACAAAGCCCAATATACCTGAGGCAATCCGCAGGAACTATGAGCTCAT^ 
670 680 690 700 710 720 

690 700 710 720 730 740 

GTGAGAAGACAAAGCTTCTCATTCCCGCCCAGAAACAGAAGGTGGTCCAAAAGGAA^^ 

GCCAGAACACGAACCTTCTCATTGCACCCCAGAAGCAGAAOCTGGTGGAAAAGGAC^^ 
730 740 750 760 770 780 

750 760 770 780 790 800 

AGACAGAGCGGAAGAAGGCGCTCATTGAGGCAGAAAAAGTGGCCCAGGTGGC^ 

AAACACACAGCAACAAGCCCCTCATTGACCCAGAAAAACTCCCACACGTTGCAGAAATC^ 
790 800 810 820 830 840 

810 820 830 840 850 360 

CCrACCCCCACAACGTCATCGACAACCAGACTGAGAACAACATTTCAGAAATTC^^ 

CCrATCGOCAAAACGTCATCCACAACCACACAGACAAGAATCTCAAAACATCTGTAG-TC 
850 860 870 880 890 900 

870 880 890 900 910 920 

CTCCATTT -CTCGCCCCCCACAACCCAAACCCAGATCCTGACTCCTACACTC - -CTATGA 

CTCACTTAACACTT - -TGACAACACCCTAAGCATGCCCTTCACCCAACACCTACCTCTGC 
9L0 920 930 940 950 960 
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930 940 950 960 970 980 

AAATAGCCGAAGCCAATAAGCTGAAGCTAACCCCTGAATATCTGCAGCT^ 

GAGAAGGAGGAGGCA GCCATTTCTAACTC GTTTCTATAGAAGCCCTGGGTAG 

970 980 990 1000 1010 

990 1000 1010 1020 1030 104 0 

AGGCCA' rT GC T T C CAACAGCAAGATTTACTTTGGCAAAGACA"TTCCrAACA 

ATGCCTCAGCA- -CGGTGCCTTTTr -CGGGAGGAAA 
1020 1030 1040 1050 1060 1070 

1050 1060 1070 1080 1090 1100 

GACrCTGCGGGCAGTGTGAGCAAGCAGTTTCAGGGGCTAGCTGACAAG^ 
: : • 

CCCTCTGCA — C GTGACCTGTCAATATG — GTGCTAAATGT — GTCTATG- — -GAC 

1080 1090 1100 1110 1120 

1110 1120 1130 1140 liSO 

TTAOAAGATGAAC-CCTTGGAGA-CCGCC ACTAAGGAGAATTGAAAAAAACTTGAT 

CCTGCTCTCCGTCTCCAGGCAGTTCTACCGTATACTTCGACCCTTGGGT^ 

1130 1140 1150 1160 1170 1180 

1160 1170 1180 1190 1200 1210 

ATCACTGCAAATGATACT-TAAGCAGATCrrTATTTTTTAAGATGAATC^^ 

ACrGCrGGTGTWATGTCAACA'TTCCTATAAATTC - AATTTCCCTCTGGA-GTTCCA 

1190 1200 1210 1220 1230 

1220 1230 1240 1250 1260 1270 

CCCTCCCCGACTACCTTCTCTGACTGTCTTCCAGTTAC^^ 

CGCTACGC— CTG— TGC-CAGGCAAAC- -CCTGTGCCTA- -GAACATAGCCTGGACGTC 
1240 1250 1260 1270 1280 

1280 1290 1300 1310 1320 1330 

ACTTAAATCCACTCCCTTTCTACGCAAAGGACGCTCGGGACTGAtGAT^^ 

ACAGCTACTCTGTACATTTCT GCTTGCTTCATTCC-TCTCTACTTCCACGGCTTAGA 

1290 1300 1310 1320 1330 1340 

1340 1350 1360 1370 1380 1390 

TTCAGCTAAGCACTTTATATCACTTCCAATAACATTTCTAAATC 

T- -CCACAAACAAGAGTCTAACCTTCTCATGCTCCCACTTT -TC -TCGATTAGAC-TTCG 
1J50 1360 U70 1380 1390 

1100 1410 1120 1430 1440 1150 

ACCTCTACACACTAATrTTATCCTTTG A -CGCTCGCTTAATTAG - -GCATCCTGTCAT - T 

A--TCAATATTCTTCTAA-ATCCTCTGACAAATGATCTAATTAGAAGAAATC 
1400 14L0 1420 1430 1440 1450 

1460 1470 1480 1490 1500 1510 

AAGCAGAGCGACAAATGTAGACTCTTACCTCCAACTCATTTCATTTCCCr^ 
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TTCCTGTGTGCATTGCTGGGACAAATGCCTC CATTAGAAA ATTCAAAGAAA 

1460 1470 1480 1490 1500 

1520 1530 1540 1550 1560 

AATGCAGTCCAGTCTTCTCACCTCTG--CCTCCAAGGTAGGAGATGTCTC 

«•••• m \^ m m ** ITS 'I 

«« • V*** •»•• •••• ••*« • • , 

GTCATAATCGAGAAT-CTCTTTGGTGGTCCTCTAAGGCGGGT- -TGrmTCAATGTTGT 
1510 1520 1530 1540 1550 1560 

1570 1580 1590 1600 1610 1620 

TVWKCAACTGAGCAAATATGTGCCTGTGAGTTTGCC^^ 

. , *«••«•• ■ 

Z •■• • • • a ••>•* ■ • •••••• • 

TG-TCTT -GGAGCTTGGAGGTGAAATTCAATGT TTAAAATTTTTAGGAAATTTATA 

1570 1580 1590 1600 1610 

1630 1640 1650 1660 1670 1680 

CAGAGAA-CATTTGACCTTCCTGGOVTTCTTGT^ 

C^GAAACTTTTAAATAAAGTATATTGAATGT-GCCATGAAAAAAAAAAAAAA^ 
1630 1630 1640 1650 1660 1670 

-J 

1690 1700 1710 1720 1730 1740 

TGTCCTTTCTTX^AGCCCntrATAAGGAAGTACl^^ 

CCCGCCG 
1680 
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(t uc 

Huaa4 U TGTGCAGACAACACTACAAACTGATGAAGTTAAAAATGTWCTTCTG^ 

240 250 260 270 280 290 

40 50 60 70 80 90 

AGTCATGATCTATATTGACCGAATAGAAGTCCTTAATATCTTGGCTCC^ 

>•••■■•••>••■••>><••••••••■••■■•■*■•••■•••••#•«•«*•«•■■■•■■ 

•••*••*••••*••••■•«••>••■■•••••••■••••••«••••■■••••■••■••••« 

CGTCATGATCrATATTGACCGAATAGAAGTGGTTAATATGTT(X;CT C 

300 310 320 330 340 350 

100 XIO 120 130 140 150 

TGACATTGTGACGAACTATACTCCAGACTACGACAAGACTTTAATCT^ 



TGATATCCTGAGGAACTATACTGCAGATTATGACAAGACCTTAATCTTC 

360 370 380 390 400 410 

160 170 180 190 200 210 

CCATCAGCrGAACCACTTTTCCACTCCCCACACACTTCAAGAACT^ 



CCATCAGCTCAACCAGTTCTCCAGTCCCCACACACTTCAGGAAGTTTACATO 
420 430 440 450 460 470 

220 230 240 250 260 270 

TCATCAAATAGATCAAAACCTGAACCAGCCCCTCCAAAAACATTTAAACACCATCCC^ 



TCATCAAATAGATCAAAACCTCAACCAAGCTCTCCACAAAGACTTAAACCrW 
4:^0 490 500 510 520 530 

290 290 300 310 320 330 

ACCTCTCACTATCCAGGCTGTCCCTCTTACAAAACCCA.^AATCCCACAACCCATAAGAAC 



AGGTCTCACTATACACCCrCTCCGTGTTACAAAACCCAAAATCCCAGAAGCCATAAGAAG 
540 550 560 570 580 590 

340 350 360 370 330 390 

AAATTTT G AATTAATCGAGGCAOAGAAGACAAAACITCTCATAGCTGCACAGAAACA^ 
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AAATrrTGAGTTAATGGAGGCTGAGAAGACAAAACTCCTTATAGCrG(^ 

600 610 620 630 640 650 

400 410 420 430 440 450 

GGTGGTGGAGAAAGAAGCTGAGACGGAGAGGAAAAGGGCTQTTATAGA4^^ 

GGTTGTGGAAAAAGAAGCTGAGACAGAGAGGAAAAAGGCAGTTATAGAAGCAGAGAAGAT 
660 670 680 690 700 710 

460 470 480 490 500 510 

TGCACAAGTAGCAAAAATTCGATTTCAACAGAAAGTGATGGAGAAAGAAACTGAA^ 

TGCACAAGTGGCAAAAATTCGGTTTCAGCAGAAAGTGATGGAAAAAGAAACTC 
720 730 740 750 760 770 

520 530 540 550 560 570 



CATrrcrGAAATCGAAGATGCTGCATTCCTGGCCCGAGAGAAAGCGAAAGCAGATTC 
780 790 800 810 820 830 

580 590 600 610 620 630 

GTATTACGCTGCACACAAATACGCCACCTCAAACAAGCACAAAC^ 



ATATTATGCTGCACACAAATATGCCACCTCAAACAAGCACAAGTTGACCCCGCAATAT^ 
840 850 860 870 880 890 

640 650 660 670 680 690 

GGAGCTCAAGAAATACCAGGCCATTCCCTCAAACAGTAAGATCTACT^^ 



GGAGCTCAAAAAGTACCAGGCCATTGCrrCTAACAGTAAGAT^ 

900 910 920 930 940 950 

700 710 720 730 740 750 

CCCCAGCATGTTTGTGGACTCCTCCTGTOrrCTGAAATACTCTC 

» ««■•■••■■•• «* «««••■■«> >• 

CCCTAACATGTTCGUXXSACTCCTCATCTCCTTTGAAATATTCAGATATTA 

960 970 980 990 1000 lOlO 

760 770 780 790 800 810 

AGAACACrCCCTTCCCCCAGAGCAGCCCCGTGAGCCCTCTCGAGAGACCC 



ACAAACCrCACTCCCCTCTAAGCAGCCTCTTCAACCCr^^ 
1020 1030 1040 1050 1060 1070 

820 830 840 850 860 870 

CAAiX;*\CAACCCACCrrGATGCi\ACAGGTGCAAATGTTCTCCCATATCAAOATC 



CAAAdAiJAGCACAGGTTGATCCAAGAGGTCCAAATGTTCTCC - ATATCAAGA TOTCGCCC 
Lt,HO 1090 llOO lllO 1120 UJO 

8i?0 890 900 910 920 9J0 

AAGGCCCTAAGTCCCAACAGTCCTTATCTGGACTCGTAACATTCACAGAGA^ 



AAGCGGTTAAGTGGCAACAATCATTATACGCACTCTTCAGATT^ 

II4J USO U60 L170 1180 1193 
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940 950 960 970 980 
CT G TT G T lM TTCntrnx;TCATAGTCrTGGTTOa:CAGCTGACTAC^ 

■■ ■ ■ • A * 

>«••« • ••>• * • a, 

TCATCTGTTCCACCrCTCCTGCGATAGTCCTGGGTGCTCaVCTGATT^ 

X200 1210 1220 1230 1240 1250 

990 1000 1010 1020 1030 1040 

AGCTGTCTGGCACTCAAACGGT<nrTGC\GCCACAGrTTTATC^ 
••••••••••"••*•"»• •»••• * 

AGCTGTCTGACACACAAATGGTCrTTTa^GCCACAGTCTTATCAA^^ 

1260 1270 1280 1290 1300 1310 

1050 1060 1070 1080 1090 1100 

TCCTTTGTAAACCGGTACTCATGAATCAGGGAAAGTCTGATGCTAAGATACTGCCTGCAC 

TCCTTTCTAAACTGCTACTCATCAATGAGG-AAACTCTGATC 

1320 1330 1340 1350 1360 1370 

1110 1120 1130 1140 1150 1160 

TCGAATGTCAAACACTATATAACAAGCTGTGGTTTTTAAAAGCTAT^ 



1170 1180 1190 1200 1210 1220 

ATTGGTGCCTGAGGACATGTGTGCTCAGACATTCAAGAGCTACGACGCCAGAGAGAAGAC 



TTCCCTG 



1230 1240 1250 1260 1270 1280 

CTTCAGAAAACGGTAAGTTAAAGAAGACAAGTGTCATCACACACTT^ 

CATTGGGTT GATGAC TGTCAGCA TCA 

1380 1390 1400 

1290 1300 1310 1320 1330 1340 

CTTTAAACTCrrACTCCCCCCATTCCrcCATGTCATTCACAGCCAGACCT^^ 

CTC CCC CACCCCA - 

1410 

1350 1360 1370 1380 1390 1400 

GGAAATTATCTTCCACTTGAATCACCATTTACTTCATACAAATTC 

TCCTTG--ACTAAG-GTACCT 

1420 1430 

NIO M20 1430 1440 MSO 1460 

CTAOTCACCTTGGTGGCCTGCAGGGACGCGTACTTTGCCACCCGACCAGAGGTTCCTCGA 

OGTT TTAGCCA - -CAGCCA CCTC - - 

14 10 1450 

14?; IHO 1490 1500 1510 L520 

AGATArTCCCAATCACTAGTTTATTCCGTTAGGAGACTCAGAGATATAGAAAGCACCTC 
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— - — — CTTGTAT — — 

1460 

1530 1540 1550 1560 1570 1580 

AATTTAAGGGAGATAAAGCCTGCACTGCACCAAAGCTACGGGTCCCTGTGTTTCC^ 



-* GTTACCT T 

1470 

1590 1600 1610 1620 1630 1640 

tcagtgatgtcatcaacctcactgtcccack:ccatgtgtgactaaagtgcccggtt^ 



TCAG CTCTGGCC AAGAG 

1430 

1650 1660 1670 1680 1690 1700 

CCACAGACAACTGCTTAGATGTCACCTCTTG G CT G ACCAAAGCT^ 



TGGGACAGGGTTTTAAC 

1490 1500 

1710 1720 1730 1740 1750 1760 

CAGACATAGGACCAGTGTGCAArrCCTGAT-TCA--CTGCACAGTATTATCTCATAATTG 



CACAAATAGGAGCAGCATGCAATTCCTAGTGACTTGCTGCACAGTATTGTATCATAATTA 
1510 1520 1530 1540 1550 1560 

1770 1780 1790 1800 1810 1820 

CACGAATTATTTTTTCTTTTTAAAACTGGATTTGGGGCAC^ 

••••• • ••»•• « 

••••••••»•♦♦*•••••••• « •*••«««••• • 

CAGCAA---GTTTTTATTrrTAAAACTGGATCTGGGGTATAT^ 

1570 1580 1590 1600 1610 1620 

1830 1840 1850 1860 1870 1380 

CTATCTAAACCCCAACCTTCTAGGGCTCCTATGGTCACTAACACACTCATTCTCCT^ 

CTCTCTAAAGGCCCAAGTCCTAOGGCTGCCATGCTCACAACCACACTCAT^^ 
1630 1640 1650 1660 1670 1680 

1890 1900 1910 1920 1930 

CTAATT — CTCG AACTCTGG AACAAAGTG - - ACCCAGACAGCATCCTCACT 



ATTCTrrATCnGCAGCCCACATACTGTGCAACAAAAAGTCACCTAGAAAGCATC C TT^ 
1690 1700 1710 1720 1730 1740 

1940 1950 1960 1970 1980 

CATCTTTGTCTCCTTCCCT <*CX;ATGCAGATACCCAAGTTf;CTTTTCCA.^CT 



CA rCATTGTCTCCTTCCCACCT»J(X\:c:AOAt;ATi:crrrA.W^ 

17S0 1760 1770 1780 1790 1300 

Vy^O 2000 2010 2020 2030 2040 

TTCGCCTCCOCTAiXJACATCAOAAAGAATTCTTGTGACTTCCrc^ 



i;TCACCTt:ccccAo<;AOA'n:A<;GA ttccactgacgtcctccgcagccagtcaattt 

Ml) 1320 l>»30 1840 n50 
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2050 2060 2070 2080 2090 2100 

A-TTTTCCATGAGAAGATGACAGAGTTAGCCTCrrGGCTATAGGACUTCAT^ 

• • •••••••« ■ • 

AATTTTCCATGAGAA-ACAACAGAGTTAACCTGTGGCATTAGGAGACCT 
1360 1870 1880 1890 1900 1910 

2110 2120 2130 2140 2150 

ACC -TTTTTGCCCATC^CATTAACTTTCCTGGAATATTGTGCTC^ 

ACCCTTTTTTTCCTTCAGTTTAACTTTTCTGGAGCA^^ 
1920 1930 1940 1950 1960 1970 

2150 2170 2180 2190 2200 2210 

tctgcccagcttgtt- -gacagctcttgtgtatactgtgttgaagccagacagaaaagta 
tttgtgcagcttgttaagacaactcttgtgtacactatgttg;^ 

1930 1990 2000 2010 2020 2030 

2220 2230 2240 2250 2260 
ATCGGGCCACTTCT-GAAACCTCTCAGCTGT TGA-— tctcacagcagctaaag 

ATGGGACCACTTCTAGAAATCTTTCAGCTGTCAGGCCTCTCAGTCTCATGACA^ 
2040 2050 2060 2070 2080 2090 

2270 2280 2290 2300 2310 2320 

GGTTGTGCCAAACA-TTTTATTAAGAAAGTAAACCCCAGATTTGAATGGGTC 

GGrrCTGCCAAACACTTTATTTCGGAAAGGAAAGCCCACATTTCAA 
2100 2X10 2120 2130 2140 2150 

2330 2340 2350 2360 2370 

AGGCCTTATAGTATAGACCCATTTCTAATATGGACAAAATAATTTTTC TCAT 

******** *********«**S*2**aB****4***«********* •*•* 

GGGCCTTATCCTATAGAGCCATTTCTAATATCGAGAAAATAATTTTTCATTTTTC 
2160 2170 2180 2190 2200 2210 

23S0 2390 2400 3410 2420 2430 

TTAATTATAGAAATTACCTTCAAACA--CATTTTGTGTTCTrrM -C-CCTTCAAA-TA 



TTAATTCTATAAATTCTCTTTATAAATGAATTTTGTGTTCTTTACT^ 
2220 2230 2240 2250 2260 2270 

2440 2450 2460 2470 
CTCCTCTTACATTGTTC CTC - C ACATAAATG ATGATTGTCGT 



CrrTTCAATTATAAAAATAAAATCTTTACCTCTCGAATTCTTC 
22^0 2200 2300 2J10 2320 2330 

J4.n0 2490 2500 2510 2520 2530 

COOATA rCTGCATCACTCAGCTCTCTCCTTTCATTCCTAGACATCT^^ 

CGAAAATCTGGATCATTCACCTCTCTCCTTTCATTCCTAGACATCrr^ 
:J4J 2350 2360 2J70 2330 2390 

2540 2550 2560 2570 2530 2590 

TA^n'OAAATGC'rerrCCCCCAAAG'nurG G TT G T G GGA' m 
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-AGCAAAA-GCTGTTGCCCCAAACTGATGGCCCTGGAGG CGG GGC 

2400 2410 2420 2430 2440 

2600 2610 2620 2630 2640 2650 

GGTGAGGAGCAGGGAAGCGCCATTCnXSAAAGATTAAAGAAAGCACITC 

- -TGAGGAACAGGGAAATGCCGCTGTGAAGTCTTAAA GCACTTCTGCTTAAACTCC 

2450 2460 2470 2480 2490 

2660 2670 2680 2690 2700 

TTATG GAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACCATCCTTCAG 

, : . : : : : : - : 

ATGTGTGAGGAGTGTGCCTCCCTGTGCCCTCTCAGC — TCTGAGGCTGGCCGTCTTTCGG 
2500 2510 2520 2530 2540 2550 

2710 2720 2730 2740 2750 2760 

GGACGTTCCTTlTGGTAAATATACACrGTAATCrra 

::::::::::::::::::: 
GGT-GTTCCTTTTGGCAAATATACACTGTAATCTT -GAGTCTAAATTTATATGTTGAAAT 
2560 2570 2580 2590 2600 2610 

2770 2780 2790 2800 2810 2820 

— TAACTTTTTT TAAAAACCTAAATAAAATTATTTTCCTATCAAAAAAAAAAAAAA 

GCPACCTTTTTTAAAATAAGAAACTAAATAAAATTATTTTACTATC^ 

2620 2630 2640 2650 2660 2670 

2830 
AAAGGGCGCCC 

V 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
2680 2690 2700 
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10 20 30 40 50 

GTCGACCCACGCGTCCGGCGGGGACAACTGGGTCTTTTGCGGCTGCAGC-GGGCT^ 



GTCGACCCACGCGTCCGGC CTGCTGA- 

10 20 



30 



40 



50 



60 70 80 90 100 110 

GTGTCCGGCirrCCTGGCCCAGCAAGCCTGATAAGCATGAAGCTCOT 

GCATCTAGT CT T GC TGGCTCAGCAAGCCCGATAAGCATGAAGCT^ 

60 70 80 90 100 110 

120 130 140 ISO 160 170 

GTGGTCGGGTGTTTGCTGGTGCCCCCAGCTGAAGCCAAC^ 

GTGGTGGGGTGCTTGCTGGTGCCCCCAGCTCAAGCC^ 

120 130 140 150 160 170 

180 190 200 210 220 230 

TGCAAATGCATCTGTCCACCTTATAGAAACATCAGTGGGCACATTTAC^ 

TGCAAATGCATCTGTCCGCCTTACAGAAACATCACCGGGCACATTTAC^^ 

180 190 200 210 220 230 

240 250 260 270 280 290 

TCCCAGAAGGACTCCAACTCCCTCCAaSTGGTGGAGCCCATGCCAGTC^ 

TCTCAGAAGGACTGCAACTGCCTGCATCTGGTGGACCCCATGCCAGTGCC^ 

240 250 260 270 280 290 

300 310 320 330 340 350 

GTCGACCCCTACTCCCT CCTC T CC CAGTGCAGCTACCACCAGCGCACCAre^ 

GTGCAACCCTACTCCCTCCrCTCreACTCTAGCTACCAC^ 

300 310 320 330 340 350 



360 370 380 

AAGGTCATCATTCTCATCTAC 



390 



400 



410 

rACATGGCCTTC 



AACCTCATTATTGTCATCTACCrCT 
360 370 380 



390 



rCTCTTACTCTACATCGCCTTC 
400 410 



420 430 440 450 460 470 

CTCATCCTCCTGCACCCrCTGATCCGAAAGCCCGATGCATACACTGACCAAC^ 

CTCATCCTCCTCCACCCCCTCATCCOCAA(:CCAGATCCCTATACTC 

420 430 440 450 460 470 



480 490 500 510 520 530 

i; AoC JAtJAATCAGGATGCrCc ;(rrcrATUGCAGCAGCTCCTCCATCCCTCGOCGGACCC 



IS ('loo) 
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GAAGAGGAGAATGAGGATGCTCCCACCATGGCAACAGCCGCTGCGTCCATTGGAGGACCC 
480 490 500 510 520 530 

540 550 560 570 580 590 

CGAGCAAACACAGTCCTGGAGCGTGTGGAAGGTGCCCAGCAGCGGTGGAAGC^ 

CGGGCAAACACTGTCCIWAGCGGGTGCAAGGCGCTCAGCAGCGGTGGAAGCTGCAGGTG 
540 550 560 570 580 590 

600 610 620 630 640 650 

CAGGAGCACCGGAAGACAGTCTTCGATCGGCACAAGATGCTCAGCTAGATGGGCTGGTC^ 
«■*«««•«■««••••••■••>•••*• «••*••• 

CAGGAGCAGCGGAAGACGGTCTTCGACCGACACAAGATGCTCAGTTAGATGGT-TTC 
600 610 620 630 640 650 

660 670 680 690 700 710 

GGTTGGGTCAAGGCCCCAACACCATGGCTGCCAGCITCCAGGCrGGACA;^ 

• .«••»,. •* .>> ««• • 

GATTGCATCAGAGACCTGG-GCCATGGCTACCAGCTTCTGGG CCT C 

660 670 680 690 

720 730 740 750 760 770 

TACTTCTCCCTTCCCTCGGTTCCAGTCTTCCCTTTAAAAGCCTC 

-ACTGCAGTCTTCCCT -GG- - - * --GTCTTCCCTTCAAATGCCCATGGCGTTTATCC-' - -T 
700 710 720 730 740 

780 790 800 810 820 830 

TCTCCCTAACTTTAGAAATGTTGTACTTCGCrATTT^ 



TCTCCCT- -CTCTAGAAATGT ACTCCACTGTTATAACGAGGGA -GTGTGATTGGGTC 

750 760 770 780 790 800 

840 850 860 870 880 890 

TCTGATCTCCGTTCTCTTCTTCGCT^^ 

TCTGTA GGTCT CTGGGGGGTAGAGGGGAGGGG-AGGGAAGGC-AGA 

810 820 830 840 

900 910 920 930 940 950 

ACCGAATCGACACATTCCAGGCGGCCTCAGCACTCCATCCGATCTCTCr^ 

ACGGAACAGACACATTTCACGTCCCCACATGATTCCCTCGAATTCATCCCTC 
850 860 870 880 890 900 

960 970 980 990 1000 lOLO 

ACrCTTGCCCCCrrCCACCTC^AG*rCTTCCCAATGrrcTTACC 



AC -CATTCCTC CCAGCrCCACATCrTAAv^GArJC - -TTAC CGCAGACGAAGCT 

910 920 9J0 910 950 

1020 10)0 10-10 1050 1060 1070 

CCCTCTTCACCAACirACTCrcrCGtVXOOAAAGCATGCCCCACCATTCACCATCTCTTC^ 



CTCTCATCAAGAGi:TCAG'n'«"Cri;CCAwGAAAGTATGATCCAGCCCTCAGCCTTCCCTC^ 

960 ')7i) 9.S0 yyo looo loio 



wo OQ/18904 PCTAJS99/22817 

62/112 

1080 1090 XlOO XXXO 1120 1130 

TTTCTGCAGTGGTTCTTTATCACCACCTCGGTCCC;^^ 

""• ••••• • • z z z irii M. ' " * • ?iT 

AGGATGCTGItXntXCCATTC-CCAGTTCCTT' -CAGTGCCA 

1020 1030 1040 X050 X060 1070 

XX40 1150 1160 1170 1X80 1190 

CAGCTCCAGCCCTGAGGACAGCTCTGATGGGAGAGCTGGGCCCCCTGAGCCCACTGG-G 

-TACCCCAGTC-TCAGGA ACTGTTG TGGTGCCCCTGAGCCCACAGTCAT 

1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

CTTCAGGGTGCAC-TGGAAGCTGGTGTTCGCTGTCCCCTGTGCACTTCTCGC^ 

CrcCAGAGTCCACCTGGAAGGCTGT-TCCCCTCTCCTCGGCTC-CTGGTC-CACCAGTTC 
1120 1130 1140 1X50 X160 1170 

1260 1270 1280 1290 1300 

ATGC-AGTGCCCATGCATAC TCTGCTGC--CGGTCCCCT--CACC-TGCACrrGA 



ATGGCAGTGCCCATGCATGCCCGCATATTCAGCAGCTGTCACCTTACTCCCATCCC^ 
1180 1190 1200 1210 1220 1230 

1310 1320 1330 1340 1350 1360 

rAGTCCXTTCCTCTCCCCAGTCTCCACAGTCACTCAGCCAGACGGTCGCTT 



GGCCGTAAGGCC -TCCCACCTCTCCCCTGTGACTCCAGCTGCTGAGCCATAA AGTT 

1240 1250 1260 1270 1280 1290 

X370 1380 1390 1400 X4X0 1420 

GGAACATGACACTCGACarrcAGCGTGGATCTGAACAarACAGCXCC^ 

GGACCATATGACACAAGCCCAAT-GCGGACCGGAGTACCATGGCTCCTCTCCTT^ 
1300 1310 X320 1330 1340 

1430 1440 1450 1460 1470 1480 

CC lXr in Xi TCCCTGAACTTC G T TC TACCAGTGCATGGAGAGAAAAT T T ^ 
••••••«••■•■•• ••>•••••• • ■ 

TCTCTTGTCCGTG AATTTCATTGTATCA -TGCATGGAGAGAAAAAAAAAAAAAAAAAAAA 
1350 1360 1370 1380 1390 1400 

1490 1500 1510 1520 1530 XS40 

TACAGTT C TCTCTAAATCAAGCAACCCATCATTAAATTGTTTTATT^^ 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
LlIO L430 14 30 1440 1450 L460 

1550 1560 
AAAAAAAAAA GGCCGCCCG 



AAAA.\AAAAAA/VVV\AAAAAAAA/\A/\AAAAA^^^ - 
M70 U})0 L490 1500 15 LO 
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10 20 30 40 50 60 

GCACGAGTCCAGACGGAAGTGCGGGCGGAGGATCCCCAGCCGGGTCCCAAGCCTGTGCCT 



G-TCGA- 



-CCCA—CGCGTCC- 
10 



70 80 90 100 110 120 

GAGCCTGAGCCTGAGCCTGAGCCTGAGCCCGAGCCGGGAGCCGGTCGCGGGGGC^ 



-GGGC GC-GGGGCTCG- 

20 30 



—GGGC TCGCAGGAGC- 

40 



— GG 



130 140 150 160 170 

CTGTGGGACCGCTGGGCCCCCAGCGATGGOGAC CC TGTG G -- -GGAGGCC^ 



CT- 



- -GGCTCCC-GCGATCGCGAGCCTATGGTGCGGAAACC 
50 60 70 80 



90 



180 190 200 210 220 230 

TGGCTCCrrCCTCAGCCTGTCGTGCCTGGCGCTTO 



GGGCTCGGGGCTCAGCATGTCCTCC 
100 110 120 



130 



140 



150 



240 250 260 270 280 290 

AGACGCCGCCAAGAATTTCGAGGATGTCAGATGTAAATGTATCTGCC^ 

•* •> •••*««•«••••••••••»» 

•••*••■•■■■•••*■■* •••«»•••«§* •■••••«>•••••*•**••• 

AGGCGCCCCCAAGAATTTTGAAGATGTGAGATGTAAATCCATCTCCCCTCCCrATAAA^ 
160 170 180 190 200 210 



300 310 320 330 340 350 

AAATTCTCGGCATATTTATAATAAGAACATATCTCAGAAAGATTCTGA 

GAATCCTGGCCACATTTATAATAAGAATATATCTCAGAAAGATTGTGATTCC 

220 230 240 250 260 270 

360 370 380 390 400 410 

TCTCCACCCCATCC C T C TG C CCCCGCCTCATGTAGAAGCATACTCTCT 

CCTCGACCCXATCCCTCTACCCGCACXnxSATGTACAAGCATAC^ 

2H0 290 300 310 J20 330 



420 430 440 450 460 470 

CAAATATCAAGAAACAACCTCTCTCACAATCAAGCTTACCATTATAATTTAT^^ 

CAAATACGAACAGACAACCTCTCTCACAATCAAGCTTACCATTATAATTTATCTCT^ 
340 350 360 370 380 390 

480 490 500 5X0 520 530 

TTTCGCCCTTCTA C TT C T C T A CATCCTATATCTTACTC TC^ 
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TTTCWJCCTTCTGCTTCTGTACATG^ 

400 410 420 430 440 4S0 

S40 550 560 570 580 590 

GCGCCTCTTTGGACATGCACAGTTGATACAGACTOATGATGATATT^ 

• •« • 

GCGCCrCTTTGGACACrcCCAGCTGTTGCAGAGCGATGATC^ 

460 470 480 490 500 510 

600 6X0 620 630 640 650 

TTTTGCAAATGCACACGAlXnX3CTAGCCCGCrcCCGCAGTC 

TTTTGCAAATGCCCATGATGTGCTGGCCCGCTCTCGCAGCCGAGC 

520 530 540 550 560 570 

660 670 680 690 700 710 

GGTAGAATATGCACAGCAGCGCTGGAAGCTTCAAGTCCAAGAGCAGO;^^ 

GGTGGAGTACGCTCAGCAGCGCTGGAAGCTCCAGGTCCAGGAGCAGCGAAAGTCTG^ 
580 590 600 610 620 630 

720 730 740 750 760 770 

TGACCGGCATGtTCTCCnx:AGCTAATTGGGAATTGAATTC^ 

CGACCGACACGTTGTCCrcACCTAACrGGGAACTGGAA 

640 650 660 670 680 690 

780 790 800 810 820 830 

CGCAGACAACTXXIAAAGAACTGACTGGGTTT^ 

CGCAGACAACTGGGAAGAATTGTCTGOCTGT- -CCGTG CGTTTTAATGCCATGTTTG 

700 710 720 730 740 

640 850 660 870 880 

TTT - • -CA- - -CCAA -CTG >TTGCrCGAACATTCAAAACTGGAACCAAAAAC*TT GCi n' G 

TTTTTACAAATCCTTGCTOGATGGAGGAAGACTCCAAAC^ 
750 760 770 780 790 800 

890 900 910 920 930 940 

ATTTrrTTTTCTTGTTAACCTAATAATAGAGACA 

GTATTTT CCTGTTAATATATTAATAGAGACATTTTTACA -CCACACAGTTCCAAGTC 

810 820 830 840 850 860 

950 960 970 980 990 1000 

ACCCAATAAGTCTrrrCCTATTTCTCACTTTTACTAATAAAAATA;^ 

AACCACTAACTCTTTTCCTACTTCTGACTTrrACTAATAAA^ 

S70 880 890 900 910 920 

lOlO lOJO lOJO L040 1050 1060 

TA rCTDGAAirrCCTTTACCTCGAACAACCACTC^^ 

TATtrrT!;AAi;CCCCCTCCCn;G/\AC#\ACCTCTCTCTTTC^ 

9 JO 940 950 960 970 980 
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1070 1080 1090 UOO lUO 

-TCCTCATGGAAATGTC TGC-TTTATGAAACT-ATGCACATATTGAAAGTGAGTTG 

GTGTTCAAGATAACTTCCAGGTX3TGTTTTTGCTTCT 

990 1000 1010 1020 1030 1040 

1120 1130 1140 1150 

AAA CAAATGAGGG-TTGGGTAG GAG-CTT- -CCAGGC CTCQGA 

. : : : : : : : ; , : : : : : : . . 

GAAGGAlXK:C7rTGGGA G T GC TT GA GTAGCTTCrCAA(^ 

1050 1060 1070 1080 1090 UOO 

1160 1170 1180 1190 1200 

TTTACACCACGCCTA - -GCCCAGCAGAGGCCTTAGTCCCATT-TGG- -GGCTT- - -TCGG 
: : . : : . • : . : : . : : ; , : : : : : : , * • 

AATACTTCAGACCCTCTACTTCACACTTGTO 

1110 1120 1130 1140 1150 1160 

1210 1220 1230 1240 
AG -TGACATTTGCT-TGA-GGCTTATACA CTGGT ~G 



TGCTGGCCTCCCCACTTGACTTTTGCACTGACTACAT^ 

1170 1180 1190 1200 1210 1220 

1250 1260 1270 
tGGTTCCCTGGCTTG- -CAG— GAAATGA CCAAG CTCACA 



TGGCItSCATTTCATGACCAGTTGGATCTGAAATGCC^^ 

1230 1240 1250 1260 1270 1280 

X280 1290 1300 1310 1320 
CATGC IXXXrrGAAGCGT-AAGMR-KACAACTGACGTACrrCTTrrGA 



TTTGTTTCATGCACTGTGATGTCTGACGCAA^ 

1290 1300 1310 1320 1330 1340 

1330 1340 1350 1360 
ACCATGAACGTCGTG - -GATTCTCAGCC - CTGGC GGTCTTCCTCA -C 



TACTTTACACrCATACCTAAACACAGTCTCACTGTGTCTGGTCT^ 

1350 1360 1370 1380 1390 1400 

1J70 1380 
CTG ACCAC — CTT CAGACCC ACCC 



CTACCrCTMCCACTTCAACArrTACAATAAACACATTTTC^^ 

1410 U20 1430 L440 I4S0 1460 

1390 1400 1410 1420 H30 
TTTCTAGTT TGCArrTCCTGCTGCACACATTTAAGGCATA ACACCACAT 



TtWATCATTCACCTACAAATACTC^AT -CAGCCTTTTCTC^ 

1470 1480 1490 1500 1510 

U40 14^0 I4o0 147Q M^J 

TCATcccTT-TGtrmY; - • -OGA-nrr cagcaatacact-- -cc -catccaaaoat- 
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TGAACTGATGTGGGCAGCn'TTGAACAAGGACTAGAGT^ 
IS20 1530 1540 15S0 1560 1S70 

1490 1500 1510 1520 1530 1540 

TCTCTGGTTTTATGG CTmnntXiCmtrr ^TTA^ 

TCTAACAGTTATTCGATAACTGGCTTTTTTCT^ 
1580 1590 1600 1610 1620 1630 

1S50 1560 1570 
GTCTTTX5AATATGAATGTATTTGTAAAATAAAAAA 



aaaataatttacaaaacccaaaaaaaaaaaaaagggcxxk:c6 

1640 1650 1660 1670 1680 
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10 20 30 40 SO 
t4u*^**i GTC0ACCCACGCGTCCQ---CTCTQAGTCACCCQAATCTAG6T8GGGC --CGCC-C6 

• ••■•■•••■■>••■•••■ m m A m * 

• ••••>•••••■••■•> *•« aa^a • 

HLlKiHC GTCGACCCACGCGTCCGGCGCTCTGAGTCACCGGAATCAAGGTGTGGCTGGAGCGCCGCT 
10 20 30 40 50 60 

60 70 60 90 100 

GAGCGGCGTCCT CGGGAGCCGCCTCCCCG CGGCCTCTTCGCTTTTCTGGCG 

CCCCCX3CCGCaVGCCCGGGGGCCGCGTCTTCGGGCGAGCCGCCTC^ 

70 80 90 100 110 

110 120 130 140 ISO 160 

GCGCCCCCGCTCGCAGG - CCACTCTCTCCTGTCGC - CCGTCCCGCGCGCTCCTCCGACCC 

GTGTCAGCGCTCCCAGGACCACTCTTGGCCGCTCCTCCTGCCCG - GCQTTCCTCCC 
120 130 140 150 160 170 

170 180 190 200 210 220 

GCTCCCCTCCOCTCCGCTCGGCCCCGCGCCCCCCCTCAAC»TOATCCGaXJCGGCCT^ 

Ilfiil • ••••••• • 

-CTCCGCGC CCGC CGCCACC-GAaSAOlTGCTGCGCTGCGGCCTGGC 

180 190 200 210 

230 240 250 260 270 280 

CTGCCACCGCrCCCCCTGGATCCTCCCCCTGCTCCTACTCA 

CrCCCAGCXJCTCCAGCTCGATCCTCCCCCTCCTCCrCCTCACOT^ 
220 230 240 250 260 270 

290 300 310 320 330 340 

CATCGCCCTCGCCGGCCCCXSCCrGGXlGOVGTCTAGCGACCACGGCCAGACGTCCTCGCT 

• ••••••«•••■ •••«> 4a*«aa 

a«a«»**»aaaaaa>*a*«a«»a*« aaaaaaaaaaaaaaaaa* *•••••••« 

CATCCCCCTCCCCXXJCCCCGCCTCGCTGCACTCTACCAA^^^ 
280 290 300 310 320 330 

3S0 360 370 380 390 400 

GTGGTGCAAATCCTCCCAACAGGGCGGCGCCACCXXXn*CCTACGAGGAGGGCTGTCAGAG 

TTCCTCGACCTCTTTCCACCAGGCCCCCCGCACCGGCrCCTACGACGATGCC^ 
340 350 360 370 380 390 

410 A20 430 440 450 460 

CCTCATCGAGTACXCCTCCCGTAGACCACCGCCTGCCATGCTCn'CTC^ 

CCTCATCCACTACCCATCCCCACGACCACCTCCACCCACGCTTTTCTCTC 
400 410 420 430 440 450 
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470 400 490 500 92 -7 880 

C Clt W T CMiaCiUi ' r TC AICClLiLLiiLrX ' CQlCLiLiLiUM lCCCCAGA ^ 
x::s ::::: s::tt ::::: : : : s : 1 1 1 : : : < s s s i s i s $ zs 

CCriUTUGA'XCriJCrrCAl rCl'C XCUllU'ttUCCCTUlUlUi^ACCCCAGAXUC llUl i'X"i' 
460 470 480 490 500 510 

530 540 550 560 570 580 

:: i t ::::::::: i s s 1 1 : 

CCTTSAGAGTCATTGGAGGCCTCCTCCCACIUGCrCTCCATATTC 
530 530 540 550 560 570 

590 , 600 610 620 630 640 
AAXTTflCCCCXnrGAAGtACACCCAaACCTTCACCCCT 

::: :::::: ii << s : 1 1 f i i ttttitstti ::::: : 1 1 1 tttttttt t tit 

580 590 600 610 620 630 

650 660 670 680 690 700 

CATC7ATAAOX3GGCCTACXK3CTTTGGGTGGGCAGCCA 

::::: :: ::: zr: : 

CAIXn^XAACTGCGCCTATCfGCTTCGGATGGQCXXvCCACC^^ 
640 650 660 670 600 690 

710 720 730 740 750 760 

Cl ' iri iC r rCTC X: T GCC TOC CrM CTaOCWUSATGA CLllCITO GC^^ 
I : ; ; : I : : : : : : : X ; : : s t s y s : X t : 1 1 1 1 * t : : : 1 1 : : : ;:x: • stttii::ttx 
LirCirCI TCT CCTGCC TCCC a UUmiCCAGC^TGAC ClTn 
700 710 720 730 740 750 

770 780 790 800 810 820 

CTA C TT C T A CAa V TCT G CCTAA L - riUUG AATCA A TGT GC GAGAAA Al^ 

: tttti* :::xtt{ ; :; 

OPlLTTCTATCCCCCAGCCTAAlUTUXtfV(KSAACJtfj^ 
760 770 780 790 800 810 

830 840 850 860 870 880 

ATCX»CTCCACAjU3AAGAAA CnUllTCirU lCCCTA CTTXl^ I ' l lUiLA CTO 

; ; X t t : : . . : : : : : : : ; : : : : t . m 

ATGGAT- -CTQACaACGAAA LlGi I '-CTCCAATOCAauVCaAACrrAC ^ Tri'C kaSC^ 
830 830 840 850 860 870 

890 900 910 930 930 

TTCATATTATrAAACTAGTCAAAAATSCTAAAATAATTT -GGGACAAAATAll 1 i £ I'AAC 

• • *■ 

■■ ••«« 

TTCATATOAT CAGAAATCCrAGAATAAATCCrAAAGAAAATTCTTCATAAT 

880 890 900 910 920 

940 930 960 970 980 990 

TACTCTTA7ACTTTCATCTTTATCTTTTATTATCTTTTCTGAACTTC 

TACTCTTA- AGT ITtLATCTATCTCCT - -OTCGACTTAAAAA CA C'fTGAAT TCTC 

930 940 950 960 970 

1000 1010 1020 1030 1040 1050 

A TTAC C T A TACTATCCCAAT Ari ILLl lA TATCTATCC-ATAACATTTATACTACATTTC 

TTTCCTAACTATATCCTAA rrrriCCl lA TCTCAATTCTATACCATrrAACCTTCATTTC 
900 990 1000 1010 1020 1030 

1060 1070 1080 1090 1100 1110 

Fl6i IT CiarW^ 
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XjVAOUaAATATCCACOTCaUUU^^ 

. , 1 1 i z 1 1 : , i i 1 1 1 jpi 5vUI§ * 

iiuUWWlTATv,JCTOraAAACT^ TlUUanAOAAATCn3«SUJ3LViLiLAT 

1040 X050 1060 1070 1080 

1120 1130 1140 1150 llgO ^^"^^ 

J,,, ... : sill jj::s: ^ - - - — ^ ;j:ti:j 

TPUTWTCTGATGGGC XiaLiUi 1 'TTT^^ ^ XCTGCIMCOGC 

1090 1100 1110 1120 1130 1140 

1180 1190 1200 1210 1220 1230 

innuts TAA£X3AGAAGAGGAA<»TAAGCTIAAAA0TTaTrW ^ 

: : :::: : : : : 
TACWSAGCM-GAAAGTCACTCCOUU^ -TCCCTOACCaAAXATCCOTUlAllWOm 
1150 llfiO 1170 1180 1190 1200 

1240 1250 1260 1270 1280 1290 

ATOCAAAAAAAAAGTrTATTTrCAACCCTOOA- - ACTATTTAAGG^ - AAAOCAAAATCA 

- ^2 ' - ' - \]J ' 

i^^trrTTAAAAAGACCTTATTTTra 
1210 1220 1230 1240 1250 1260 

1300 1310 1320 1330 1340 

TTrCCTAAATCCATATCATTTCTO«»ATTT^^ 

. tixt . i » • • • 

i^TOCTAACTGAGCA iru ^ iTTO TC a AGA Al ' il i i i'OAACAAi. iJii i i 

1270 1280 1290 1300 1310 1320 

1350 1360 1370 1380 1390 1400 

AOCrAACXKrrTCATenTCACTCQlTAT^^ 

. : : : : t : : : . • x 1 1 . i : : : : IlllJ^ 

TrcrAAO- CTrCU ' lOi iC A CXTrC T C TGATCCGTACAAAACT CTTTCTAA 

1330 1340 1350 1360 1370 

1410 1420 1430 1440 1450 1460 

CCTCTTOCCATACrrrCGTAACC Li ULLX 1 iA ACTGTCAAATATTTACATGAAAi i^X^X 



C- -CTACCCAACCTTAA-GCOSCT^^ TQAAATGCTAA- .GAATTTTCCT 

1380 1390 1400 1410 1*20 

1470 1480 1490 ISOO 1510 1520 

CTrrrAAACrrnmTATACCCrrrAGGCTCTCCGAAAA 

. , . J , ... : : : : : : : z ; i s j i t u : .:-!:'• s ' ' 

^i^CCCTACTCTAGACCCCTAO^^ 

1430 1440 1450 1460 1470 1480 

1530 1540 1550 1560 1570 1580 

(rrmirrcTTrATATCTTCACAAcaw 

ATTC TC TC- .TCTATCCTTMAACCACCGTACACOT 

1490 1500 1510 1520 1530 1540 

1590 1600 1610 1620 1630 1640 

TTTATCATCACnXJATACATCTOrn'AACTTtrr^ 

•^cccnrcxAACTO 

1550 1560 1570 1580 1590 1600 

1650 1660 1670 1680 1690 1^00 

CTCACAAAAJTCCCACTAAAACACCCTCACCACAATAAATCAC TTCCTTTTCTAAA 
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GTCAQlOCACTGCCftTCaiGftCATCCT'Ago a ^^ i ' LTLA tSTG 
1610 1620 1S30 1540 & 

1710 1720 1730 1740 17S0 1760 

::. : } I • X : : * : : t : 2 . t. : sit::: :;. .::::•:: tttttttt 
CriUriLLLi iiU lCTCAC Xr CCT G 'CrCACACA^ 
1660 1670 1680 1690 1700 1710 

1770 1780 1790 1800 1810 1820 

CAGIUUCCTRCAyATAgTTiAAA Ar C C T GG T C T' r^^ ^ 

t;«xisfit: :i.ii«:ii::: iix:>tii: t : : : : t : } « 3 $ : s ::.txs:.xs:: 
CAQAAACCTOAATgrAATDVAAA- Ct iUUxiui lU.llUiiJ UM3CAGACTTAAiVAIATCTC 
1720 1730 1740 1750 1760 1770 

1830 1840 1850 1860 1870 1880 

ATATAAAACATGCC3U3U3GAGA A TT C G CG CUTT^ ^ 

s:;x:: > .:::.:::: :::.:: t :i ::r$::x:ii ix:; 
-TATAGTACATGCAAaTGaAAAATTTCCGAAT* -G0STQTCTCTGAA3A - CA.TACCGCSAA 
1780 1790 1800 1810 1620 1830 

1890 1900 1910 1930 1930 1940 

ATGCATCGGAXAOGTCAl lAliUATi'lTlTACCAl 1 iCUACTTACATAATCSAAAACCAATT 
. ::»»x ••11 : x: i:::ts:x: :::t:x xxx::«xx:x :• x 

GGOCCACmm- - •CCir- - - -TTCCrXACCATTTATACrXllCXrCAATGC^^ 

1840 1850 1860 1870 1800 

1950 1960 1970 1980 1990 2000 

CATTTTAAATATCAaATTATTATTTTGTAAGTTCTCGAAAAAC^^ 

•:tx:t: xitx:::. i :::::::::;: : :::..: ;«:::. :::::xitx 

TOrm ' A ACTATCAgAACACT A rmXJl' A A lM T GC T C C^ 

1890 1900 1910 1920 1930 1940 

2010 2020 2030 2040 2050 

TATGMCTrncaaiATMkc^ ^ 

:x :• XX X X : X : X X X ; : X X X X x . t X X • X X X : X : : X X X X z X X X 
TAC-OU U:! irCCCA ATAAACCA flOXailUU WUWWUVAAAAAAAACAAAAAAAAAAAAA 
1950 1960 1970 1980 1990 2000 

2060 

AAAAAAAAAAAAAAAAAAACCGCCCCCCC 
2010 2020 2030 
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10 

HUKf^O GTCGACCCACGCGTCCG 

M»-^W<? GTCGACCaiCGCGTCCGCGGACGCGTGGGCACTCGGCCACTCTGCGGAGCAGG^^^ 

10 20 30 40 50 60 

20 30 40 50 60 70 

GCCGCGCGCTCTCTCCCGGCGCCCACACCTGTCTGAGCGGCGC^ 

:5S; ;;;jjr::;i:;s!:!: • • 

GCC^GCGCGTCCTCCGGGCGCCCACACCTGTCT^ 

70 80 90 100 110 



80 



90 



100 



110 



120 



130 



GCGGGCTGCTCCACGCGGTA- -GCACTCAGCATGGCrGGAATCCCGGGGCTCTTCATCCT 
120 130 140 150 160 170 

140 150 160 170 180 190 

ItrrCTTCTTTCT G CT C TGTGCTGTTCGGCAAGTCAGCCCTTACAGTGCCCCC^^ 

... : ; :::::::::: : . : : : : : : : : : 

TCrrGTC ClX ? Cl^lX j lX j TCTTC A TGCAGGTGAGTCCCTACACCGTTCCCTGGAi^ 

180 190 200 210 220 230 

200 210 220 230 240 250 

CACTTGGCCTGCATACCGCCTCCCTGTCGTCTTGCCCCAGT^ 

CACAl^CCGGCTTATCGCCTCCCTGTAGTC^ 

240 250 260 270 280 290 

260 270 280 290 300 310 

GCCAGACTTTCCACCCGAAGCCAAATTACAACTATCTTCTTCATGTCGACCCCAC^ 

CCCAGACTTCGACGCCAAACCCAAATTCGAGGTCTCCTCCTCATCTC 

300 310 320 330 340 350 

320 330 340 350 360 370 

TAAGGGAACTCCACTGCCCACTTACCAACAGGCCAACCAATATCTGTCrrA 

CAAGCCAACACC.ACTCCCCACCTACXAACACCCCAACCACTACCrrTTCCTATCAAACCCT 
360 370 380 390 400 410 

J.SO 390 400 410 420 430 

CTATGCCAA'rGGCACCCGCACACACACCCACCTCCX3CATCTACATCCTCAGCAGTAC7CC 

TTATCCCAATGGCACCCGCACACACACTCCCCTCCGCATCT^^^ 

4J0 .no 440 450 460 470 



410 4 SO 4*»0 470 180 4>0 

AwATa;r.i;t:ccAACAca:AOACTc:AGi7G'rcrrc:Acc/vv\irrcrrcGAAccA^^^ 
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AGGCAGGGCACGAGGCAGAGACrCGGAGGCCACAGGGAGATCTCGCAGGAAGAGGCAGAT 
480 490 500 510 520 530 

500 510 520 530 540 550 

TTAlXKXrrATGACAGCAGGTTCAGCATTTTTGGGAAGGAC^ 

TTATOGCTACGATGGCAGGTTTAGCATTTTTGGGAATC 

540 550 560 570 580 590 

560 570 580 590 600 6X0 

CTCAACATCAGTGAAGTTATCCACGGGCTGCACCGGCACCCTGGTOT 

**>■••••• aa aaa*k**t •*av«**«*«>*»a*>a*aa*,- .« 

• «*aaaaaaaaaa**a«**aa «• •••vaaaa aaaaaaaaaatcaavavvwaaa* I* 

cxx:aacatcggtgaagttgtctactggctccactggcaccct 

600 610 620 630 640 650 

620 630 640 650 660 670 

cctcacagctgcccactgcatacacgatggaaaaacctatgtgaaagg^^ 

aaaaaa •••■•■••••aaaaaaaaaaaa* avaaasaaaaaBBBBva aa aa 
• aaa*aaa«va*>«*«»***va«**««av««a*a«aa«a*aa4»«a«««aa aaavaaa* 

cctcactgctgcccacixktatacacgatgggaaaacctatgtg^ 

660 670 680 690 700 710 



680 



690 



700 



710 



720 



730 



ccgagtgggcttcctgaagcccaagtataaagatogtgccgaaggggacaacacct^ 

720 730 740 750 760 770 

740 750 760 770 780 790 

ttcagccatgccggagcagatgaaatttcactcgatccgggtcaaacgca^ 



CTCACCCATGCCAGAaUVGATGAAGTTTCAGTGGATCCGCGTGAAACGCACC^ 

780 790 800 810 820 830 

800 810 820 830 840 850 

CAAGCGTTGGATCAAGGGCAATGCCAATGACATCGGCATGGATTATGATTATCCCCT^^ 

• aaa«» aa«a**«a*»««««««a«****««*B«aa«BgaaBaaa**a «■•■• 
aaaaaa ««aaaaaaa«a«>aaaaaaaaaaaaa*aa*»aa*v»*aaaa aa aaaa^ *« 

CAACGCGTGCATCAAGCCCAATGCCWVTGACATCGCCATGGATTATGACTA 

840 850 860 870 880 890 

860 670 880 890 900 910 

GGAACTCAAAAACCCCCACAAGAGAAAArrTATGAAGATTGGCGTGAGCCCTCCT^^ 

GCAACTCAAGAAACCCXrACAAAACACAGTTCATCAAGATTCGTCTGAGTCCrc 

900 910 920 930 940 950 

920 930 940 950 960 970 

CCAGCTCCCAGCaxrACAATTCACrTCTCrrG^ 

GCAGCTCCCAGCGG(3CAGGATCCACTTCTCTGCrrATGAC^ 

960 970 980 990 1000 1010 

980 990 1000 1010 1030 1030 

GGTCTATCCCrTCTG'rGACCTCAAAGACCAGACCTATCACTTCCTCTACCACCAA^ 



GCTGTAirCCCrrcn n-t'A-n ;T«rAAAGAn:ACAv:CTACCACCTTCTCTACCAi;CAon;TCA 
V02i} 10 10 10 \0 1050 1060 l')70 
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1040 1050 1060 1070 1080 1090 

5 TOXCACCCAGGGGCCACCGGCnCTGGGGTCTATOrG 

r;«i«*rr iiiJiiJ" ;•••••••♦-«•••••••••«•'••••■••" 

CGCCCAGCCCGGGGCCAGTGCrrrCAGGGGTCTATGTGAGGATGT^ 

1080 1090 1100 1110 1120 1130 

1100 1110 1120 1130 1140 1150 

; gaagtgggagcgaaaaattattggcattttttcaggck:ac^ 

:::::::::: : : : : : 2 ::: : 

GAAATGGGAAAGAAAAATTATCGGCATCTTTTCAGGGCACCAGIW^^ 

1140 1150 1160 1170 1180 1190 

1160 1170 1180 1190 1200 1210 

! TTCCCCACAGGATTTCAACGTGGCTGTCAGAATCACTCCTCT C ^ 

CIXrrCCACAGGATTTCAACGTGGCAGTTAGAATCACGC^^ 

1200 1210 1220 1230 1240 1250 

1220 1230 1240 1250 1260 1270 

CTATTGGATTAAAGGAAACTACCTGGATTGTAGGGAGGGGTGACACAGTGTTC 

CTAITGGATTAAATC^^ 

1260 1270 1280 1290 1300 1310 

1280 1290 1300 1310 1320 1330 
GCAGCAATTAAGGCTCITCATGTTCTTATTTTATC 

::::: :::: .:: ::: :::iz,::::: 

CaWKrACCAATOG-TCTTTTTGCACTCAW TAGCTTTTTATCATT 

1320 1330 1340 1350 1360 

1340 1350 1360 1370 1380 1390 

GGCCTGCACAaSTGTGTGTGTGTGTGTGTGTGTAAGGT^^ 

G ACTCTTGTG CTGTGAGTCA CATAGTATCTTTTACCTAGT 

1370 1380 1390 1400 

1400 1410 1420 1430 1440 1450 

TTTCTTACAATTGCAAGA «TGACTGGCTTTACTATTTGAAA^ 

ATTCTTCAAATGGCAAAAATTATTCGCTATATTATTTTAAAACTC - 
1410 1420 1430 1440 1450 1460 

1460 1470 1480 1490 1500 1510 

CATATATCATTTAAGCAGTTTCAACCCATACriTTCCATAGAAATAAAA^^ 

- -TATAGCATTTAACCAGTCTCAAAGCATACTTTTGCATAGAGACT^ GTA 

1470 1480 1490 1500 1510 

1520 1530 1540 1550 1560 1570 

TTGCGCCAATCAGGAATATTTCACAATTAAGTTAATCTTCACCTT^^ 
: : : : : :::::::::: ::::::: : . : : : : * : 

TTCCGGTAATACCCCCTArrTGACAACCAACTTAAACTTTCAGT^^ 

1520 1530 1540 1550 1560 1570 

I5rt0 1590 UiOO L610 1620 16)0 

TTTTTATTT»:ATCTGAACrTtTrTT<:AAAi;ATTTATATTAAA'r^^ 



wo 00/18904 



PCT/US99/22817 



74/112 

TTTTTGTCTGATCCAiUVCTTGCTTCAGAGGT^ 

1580 1590 1600 1610 1620 1630 

1640 1650 1660 1670 1680 1690 

ATGAATTCTTATATGTGTGCATGTGT- -GTTTTCTTC^ 

ATGAATTCTTATGTTTGTATATCTATATGTTTTCT^^ 

1640 1650 1660 1670 

1700 1710 1720 1730 1740 1750 

TTTTTTGTTTTTTTAATTCACn^ 

-ATATTGATATTTTTGTAATGTG--TGGT-TATTATGCTTC 

1680 1690 1700 1710 

1760 1770 1780 1790 1800 1810 

TTAGGAACTTTGACAGCATTTGTTAGGCAGAATATTT^ 



GATAATGATAGCA 

1720 1730 

1820 1830 1840. 1850 1860 1870 

TAGTCTTTGAACAGTAAAATGATGTGTTGACTATACTGATACACA^^ 



1880 1890 1900 1910 1920 1930 

TATAGTAAACCACTATCCCAACCTGCTTTTAGTTCCAAAAATAGTTTCTT^^ 



1940 1950 1960 1970 1980 1990 

TGT TC CT C TACTTTGTAGGAAGTCTTTGCATATC 



— AAGTCTT--CAATACGC— - 

1740 

2000 2010 2020 2030 2040 2050 

GACTGGCCAAGACTGtnTATCCCAACCCTTCCATTTAACAGGATTTC 



2060 2070 20flO 2090 2100 2110 

CAACTAGCTAT1TTTCAGAAGACAATAATCACC(3CTTAATTAGAACACGCTGTAT^^ 



2120 21)0 2140 21*50 2160 2170 

CCCAGCAAA(:A<rrn;T(:GCt:ACACTAAAAACAATCATAtXATTT^ 
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2180 2190 2200 2210 2220 2230 

CACATCTCATGTTTTATCATTTGGATCGAGTAATTTAAAATGAATO 



AATTTATAATGTTTTGGATTC 

1750 1760 

2240 2250 2260 2270 2280 2290 

AATGGAAGCATTtXICTGGCAGATGTCACAACAGAATAACCACTTO 



AAACATT — 

1770 

2300 2310 2320 2330 2340 2350 

AGTCCTCOVGCrTGATCAAAAATTATTCTGCAT^ 



TACX3TAGTAGTC ^^^^^^ 

1780 

2360 2370 2380 2390 2400 2410 

TGTACTTCTTOIATTTGGAAACTTTTCT^^ 



CTTGAAGAGAA 

1790 

2420 2430 2440 2450 2460 2470 

CmAAGAAAACCAGTXrrGGCCTTTTrCCCTCTAGC^ 



CAATAA 

1800 

2480 2490 2500 2510 2520 2530 

ATGCTCTAGGTTATAGATAAACAATTAGGTATAATAGCAAAAATGAAAATTGGAAGAATG 



— TTTATTGGCTATATTCATA — — 

1810 1820 ' 

2540 2550 2560 2570 2580 2590 

CAAAATCGATCAGAATCATCCCrrCCAATAAAGGCCTTTACACAT^^ 



2600 2610 2620 2630 2640 2650 

TTATCAAATCACAGCATATACACAAAACACTTCGACTTATTGTAT G TT^ 



2660 2670 2680 2690 2700 2710 

CirrCGCCCTAACCACTTCTTTCTAAATCTATCTC 



2720 2730 2740 2750 2760 2770 

ACCTGTTTGCTCTCCTTCCACCCCACGTAAACCTCCATT^ 



flC, 39 (^^0^7) 
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- — CCCA TATAAG 

1830 

2780 2790 2800 28 10 2820 2830 

CAGATGGAGCACTGTCACTTAGACATTCTCTOCSGGGATTTT^ 



ACTGTATCTTA 

1840 

2840 2850 2860 2870 2880 2890 

TTTTTGGAAGGATAATTCTGATAAGGCACTCAAGAAACGTACAACCA^ 



, CAGTGCA 

1850 

2900 2910 2920 2930 2940 2950 

AAATCATATGAGAAATACTATGOlTAGCAAGGAGATGCAGAGCaSCCAGGA;^ 

CAGA 



2960 2970 2980 2990 3000 3010 

GTTCCAGCACAA r rTT Cr rTGGAATCTAACAGGAATCTAGCCrro 

ATTCC— CAC " OC 

I860 

3020 3030 3040 3050 3060 3070 

TCCATTTCTATGTCTGCTATTTGGGGG rnTOl ^ ^ 



- TGCTTT 

1870 

3080 . 3090 3100 3110 3120 3130 

AAGTTCACTGAACACCAAGACCACAATGGATTTTTTTAAAAAAAT^^ 



3140 3150 3160 3170 3180 3190 

GAAGCACCTTCATTCCTTGATTTTCATTTTTTGCAAAGT^ 



TACTTTTGA 



3200 3210 3220 3230 3240 ):50 

AATCAAATCAATG'rTTAGTTCACAACTAGATCTAATTTACTAAAGAATGATACACCCAT^ 



3:60 3:170 3280 3290 3300 3 3 10 

TCCTATATACAGCTTAACTCACAGAACrCTAAAAGAAAATTATAAAAT^^^^ 



---AAATAAAA*: 
IHrtO 
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3320 3330 3340 3350 3360 3370 

CCATCTTTTT A GTGATMTAAAAGAAAGCATGGTATTAAACTATCAT^ 



3380 3390 3400 3410 3420 3430 

AAAAGAAAAAAGGACiraiTGGCATTATTAATATAATTAGTGCTCT 



3440 3450 3460 3470 3480 3490 

ACATATTAGAAGCATATTT ^CT ACTAAGGCTAGTAGAACCACATTTC 



„ ^,^,p — TTTCCC ~- 

1890 

3500 3510 3520 3530 3540 3550 

CCTTAAACACTCATGCCTTATGATTTTCTACCAAAAGTAA^^ 



— — — ----XTGTAAAAAA— 

1900 

3560 3570 3580 3590 3600 3610 

AGGAAGATGCCTCTCCATTTTCCCTCTCT^ 



3620 3630 3640 3650 3660 3670 

TAAAAGCTCTGGGAAGACCnnTGTAAAGGGACAAGTTGAG G TT G TAAAA 



3680 3690 3700 3710 

AATAAACATCTTTGATCACAAAAAAAAAAAAAAAGGGCGCCCG 



AAAAAAAAAAAAAAAGGCCCGCCG 
1910 1920 
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10 20 30 40 
WMAri GTCGACCCACGCGTCCGGGCTCATGGCGCCGGC— -GTCGCGGT TGCTC— 

M 1/ W M € GTCGACCCACGCGTCCGGT-TCATGGCGGCGGCTGGGCGGCGCCXmTO C T^^ 
10 20 30 40 50 

50 60 70 80 90 100 

-GCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTC 

TGTACTATGGATGATGGTGACTGTGATTCrCCCrcCCrc 
50 70 80 90 100 110 



110 120 130 140 150 160 

GGTGGCGCCCGGGCGGGCCGGGGGCCGTGGCGGAGGAGGAGCGCTGCACGGTGGAGCGTC 

GAAIXUSGCT-GGGAATTGCAGCAGCAGTAATGGAGGAGGAGCGTTGCACAGT^ 
120 130 140 150 160 170 

170 180 190 200 210 220 

s GGGCCGACCTCACCTACGCGGACTTCGTGCAGCAGTA(XM:CT^ 

■ 4«« • 

• •• •■■ • ••*•*«*•**•••• ■■ ■»••■* 

GGGCACACATCACGTACTCCGAATTCATGCAGCACTATCCCTTCC^ 
180 190 200 210 220 230 

230 240 250 260 270 280 

s TGCAGGGACTCACGGACAACnXAGGTTCCGGGCCCTGTCCTCCC^ 

• >••••■••••■>•>•••■*■**•«****•>■•■■•••■• •« •••••• 

TGCAACCACTCACGGACAACTCGAACTTCCGGGCrCTGTGTTC 
240 250 260 270 280 290 

310 320 330 340 

CGGCTGAGCACCCCCAACACCTACTCCTACCACAAAC 

•• •••• *. 

CCCTTGACTACACCCAACACCTACTCCTACCAGAAAG 
330 340 350 

350 360 370 380 390 400 

TGCACTTGCCCrTCCAGGACTATCTCCAGCACCrCCTCCACCCCCA^^^ 

TCCACCTCCCCTTCCAGGAATATCTGCAACAGCTGCTGCA^^ 
360 370 380 390 400 410 

410 420 430 4-10 450 460 

TCCGCAATGACACCCTCPACTTCTTCCCGGACAACAACTTC 

TACGCAATCACACCCrGTACTTTTTTCCAGACAAC^ 
120 430 440 450 460 470 

470 4>i0 490 500 510 520 

TTCGCCACTACTCCCCACCCCCATTTGGCCrCCTCGGAACCC 



290 



300 



CAACAt 



300 



310 



320 
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TCaVGCACTACTCTCCGCCACCATTCCGTCTCCTGGGi^ 
480 490 500 510 520 530 

530 540 550 560 570 580 

GAATCGCAGGAGCTGGCTCGGGGGTGCCCTTCCACTGGCATGGACCCGGGTACT^ 

GAATTGCAGGAGCTGGATCTGGGGTACCCTTCCACTGGCATGGGCCTGGTTTC^ 
540 550 560 570 580 590 

590 600 610 620 630 640 

TGATCTACGGTCGTAAGCGCTGGTTCCTTTACCCACCTGAGAAGACGCCAGAGTTCCACC 



TTATCTATGGTCGGAAGCGCTGGTTCCTCTACCCrrCCTGAGAAGACACCTGAGT^ 
600 610 620 630 640 650 

650 660 670 680 690 700 

CCAACAAGACCACGCTGGCCTGGCTCCCGGACACATACCCAGCCCTGCCACCGTCTGCAC 

CTAACAAGACCACATTGGCCTGGCTGCrcGAAATATACCCATCTCTAGC^ 
660 670 680 690 700 710 

710 720 730 740 750 760 

GGCCCCTGGAGTGTACCATCCGGGCTGGTGAGCTGCTGTACTTCCCCGACCGCTGG^ 

*• ■« «* mm m m A m * m 9 

GGCCTCTAGAATGTACCATCCAGGCTGGTGAAGTACTGTATTTTCCTC 
720 730 740 750 760 770 

770 780 790 800 810 820 

ATGCTAOTCrcAACCTTCACACCACCGTCT^ 

• ••• ••«■• •«••»•■• « B 

• ■■•■•«»• •«*«*•■• •■ •■>■■••• • 

ATGCCACACTCAATCTGGACACCAGTGTCTTCATTTCTA 
780 790 800 810 820 830 

830 840 850 860 870 880 

AGCTGGCAGGACTGCCGGTCACA-CACCAGCACGTCCCACC-TCCTGCTCACTC 

ACCCAACTCCCAACCC— -CACTGCACCAGCACATCCCAATGTAGTGCTCACAGACrTTA 
840 850 860 870 880 890 

890 900 910 920 930 940 

TTACACAGATACTCCCGGCAATGGCCTCAGCCCACCCCACCCTCACCT G CTTT ^ 



TTACA -GGACAGTCGCAGCAGCAGCAAC- -CTCAGCCCACCCTCACCCACTCT-CCAGCC 
900 910 920 930 940 950 

950 960 970 930 990 

CACAAAGCCCGACGA TCACGCCCCACCAAAACCGATCCTGACACCGGAAACAG 



CA-GAAGCGCGACAAGGGAGGCTCATCCTCCAGCAACCCGTATGCTCAGAAGGGCAGCAG 
960 970 990 990 1000 

1000 iOIO 1020 lOJO 1040 1050 

TCCAGAGTCCAACACCAC/XACrTGGGGGAAGCCCTCGCGGTGGCCAGGAACATAAACTA? 



TTCAGAACCCATCACCAGGCCC-GATCGGGGCACCC CCACGGACACAAACTAT 

iOl) 1020 lOJO 1040 1050 1060 
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1060 1070 1080 1090 1100 1110 

gtatagggck:cgggcwcttctg-c-ccagwgctcccctg^ 

; t M I t .i.'.zri • • » • ••••• 

ACA---GGGACTGGAGCTTCCGTCTCCAGATC-CTCCTGGGCCAGGGT^ 

1070 1080 1090 1100 1110 

1120 1130 1140 1150 1160 1170 

AGGGAACCTCAGTAGTCCTCCACCCAGCCATTCTCAGAGATGAATGCGTCAATAACCTCC 

ATGGGGCCTCAATAGTCCTCTACCCAGCCGTTCTCAGAGATGAAAGCGTC 
L120 1130 1140 1150 1160 1170 

1130 1190 1200 1210 1220 1230 

rrCATGGCCAAGTTGGGGATGAGCTGTTCCTC^G^ 

LldO 1190 1200 1210 1220 1230 

1240 1250 1260 1270 1280 

AAATGACCCACACGCTGCA— -GTGACAAGAAGGG-CAGAGGGCAGTCATGG— GGCCCA 

AAGTGGCCCACACGCTGCAACAGAGTCAAGAGTGTTCAATGGCCTGAG 
1240 1250 1260 1270 1280 1290 

1290 1300 1310 1320 1330 1340 

GG-ACCATGCCACT GGCCCTG-CTCCCCCAGCCGCAGGCCTCACCTGCAGGTCCTC 

:: ::::: : :::: ::: :::::::::::::: 

GGTACCAAGGCrCTCCATGGCCCGGTCTCCATXK;CCC-CT--CCITA 
1300 1310 1320 1330 1340 1350 

1350 1360 1370 1380 1390 1400 

CTCGATGTCCTTGCGGTCGTAGGTGATCCCACTGGGCGTGATGCACTC 

CrCAATCTCCTTGCGGTCATAGCTGATACCACTCGGTGTAATGCAGGGTTCCCGCATC 
1360 1370 1380 1390 1400 1410 

1410 1420 1430 1440 1450 1460 

CTCAAAGCTCATXTTTGCCACACAGGTAGTCCCGGATCTCTCGCTTCT^ 

CTC,^GCTAATCTrCCCACACAAGTACTCACCGATATCTCCCT^ 

1420 1430 1440 1450 1460 1470 

1470 1480 1490 1500 1510 
ACAOSGTCAGAGCCTCAAAACCCCCACTCCACGAGCACC-TGCCACCCATC^ A 

AAAATCTCTACAACTGGAC-CGCGCTGTGCG-GGTCACCATACCAGC-ACCACCCOATC 
14.^0 1490 1500 1510 1520 1530 

1520 1530 1540 1550 1560 1570 

CCAAGCCACACACACTCACCTTCCTCTTCTCATCCACCTCAGAAAAAAGCrrCGTCCAT^ 

CCTTCCGa;GGTC -CTCACCTTTCrrrrCTCGTCCACC^^ 

1540 1550 1560 1570 1580 1590 

15- : 1590 1600 1610 1620 1630 
CC.:CCA rGTACTTCTCCTCTGAAGACTTCACTCCTGTGCTTCGCGCA GACACCCC 
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CTGCCATGTATTTATCCTG — CAGAGTTGAGTGCCATGTGTGGGCAACTCCTGTCTCCAC 
1600 1610 1620 1630 1640 

1640 1650 1660 
AC CTCCC TCCTCCATGGGGCACA-GAC CCAACA CA- 

ACAGACACACACACTCTGTCCACCAGGGCACTCATGTCATGCATGGGCCAACAGATCCAC 
1650 1660 1670 1680 1690 1700 

1670 1680 1690 1700 1710 

- - - AGGCGGGGATGCT— -C— CCACGCCACGTGCACACACACA— GACCCACATGTGG 

CAAAGGCTGGGGCACTTTTCATGCCACAC - ACAAACACACACACAATGACCCACATGTGG 
1710 1720 1730 1740 1750 1760 

1720 1730 1740 1750 1760 1770 

GTGOGGGGCACCCTCACGIXXTTOGCCrCAATGC^ 

•••■•••••*«••>*■■■•■■■*■*•■*■*••-•■•■•••*■«■•«*« 

ACTAGGGGCACCCTCACGTGCTTCGCCTCAATGCAGGCC^^ 
1770 1780 1790 1800 1810 1820 

1780 1790 1800 1810 1820 1830 

TOTrcCTCATCACCCTCGTOGTTTCGCT^ 

• *•••••.*•••• «••*•••*••• 

TCATCTTCATGACCCTCGTGGTTCCGCTGACACTCCTCCAGTO 
1830 1840 1850 1860 1870 1880 

1840 1850 1860 1870 1880 

GAGCCGGTCAGAGATGGACCTGCCCAGATGT* --CTGACCACACCCCAATCTt^ -GC 
•••• •••••••• •••••••••■•«• • : 

AAGCTAGTTGGTGATGGCCCTGACCAGGAAATCACAGAGCCCGCCCCA*TC^^ 
1890 1900 1910 1920 1930 1940 

1890 1900 1910 

TAACATCCACA-CTTCCC CACATTT-C 

••*«•• •••••• 

TTTCCTCCTGGCCTTCCCATCTACCGCTTCTTGTCCTTCAATA 

1950 1960 1970 1980 1990 2000 

1920 1930 1940 
CTCCTTG CCAGTAAAGC CTTCGATAAAC 

CACTCAGTCTCrCCTGCCGGACCGACCCACCTCT^ 

2010 2020 2030 2040 2050 2060 

1950 I960 1970 
AAAAAAAAAAAAAAAAAAAACGGCGGCCG 



AGATATGAATGCAAAAAAAAAAAAAAAGGGCGCCCC 
:070 3080 2090 2100 
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10 20 30 40 50 60 

/Ajkxh»£ aattcggmwcmkkkgvvggwgcccx;tgc»gtgagaggatgggcgagcagtctga^ 

(jiiMnI O Q..xcGACCCACGCGTCCG--GCTGGCGGAGCAGGAGGATGGGCGAGCAGTCTG^ 
10 20 30 40 SO 

70 80 90 100 110 120 

AGAATGGATAACCGTTTTGCTACTGCGTTTGTGATTGCTTGTGT^^ 

5 2 J • J • JJ^ :::::: : : :::::: 
AGAATGGATAACCGTTTTGCTACAGCATTTGTAATTGCTTGTCT^^ 

60 70 80 90 100 110 

130 140 150 160 170 180 

ACCATCTACATGGCGGCCTCCATAGGCACGGACTTCTGGTATGAGTATCGAAGTC^ 

ACCATCrA(^TC<^ 
120 130 140 150 160 170 

190 200 210 220 230 240 

CAAGAGAATTCAAGTGACTCGAATAAAATCGCCTGGGAAGATTTCCTCGG^ 
: : ; : : : :::::::: : . ::::::.::>::: : . : : : : 
CAAGAAAATTCCAGTGATTTGAATAAAAGCATCTGGGATGAATTCATTAGTGATC 
ISO 190 200 210 220 230 

250 260 270 280 290 300 

GATGAGAAGACTTACAACGATGTTCTGTTCCGATACAACGGCAGCTTGGOT 

::i .:: :::: :::::::: 

GATCAAAAGACTTATAATGATGCACTTTTTCGATACAATGGCACAGTGGGATTGTG^ 
240 250 260 270 280 290 

310 320 330 340 350 360 

CGGTGCATCACCATACCCAAAAACACTCACTCGTATGCCCCACCGGAAACCACAGAGTCA 

cggtctatcaccatacccaaaaacatccattggtataccccaccaga;^ 

300 310 320 330 340 350 

370 380 390 400 410 420 

TrrCATGT CG TTACCAAATGCATCACTTTCACACTAAACCAGCAGTTCAT^ 

TTTCATCTCCTCACAAAATGTCTGAGTTTCACACTAACTCACCAGTTCATGCAGA^ 
360 370 380 390 400 410 

430 440 4S0 460 470 480 

GTCCACCCCCCCAACCACAATACCGGCATCGACCTGCTTCGCACCTACCTGTGCCGCTC 

CrrGATCCCGCAAACCACAATAGCGGCATTCATCTCCTTAGGACCTAT^^ 
420 430 440 450 460 470 

490 500 510 520 530 540 

CACTTCCTTTTACCCTTCGTCAGCTTCCCCITGATCTGC^^ 
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CAGTTCCTTTTACCTTriTCTGAGTTTAGGTTTC 
480 490 500 510 520 530 

550 560 570 580 590 600 

TGTGCCTGTATCTGCCGCAGCCTGTATCCCACCCrCGCCACTG^ 

TGTCCTTGCATTTGCCGAAGCTTATATCCCACCATTGCCACGC^^ 
540 550 560 570 580 590 

610 620 630 640 650 

GCAGGTCTGTGCACA CTGGGCTCCGTGAGTTGCTATGTTG--C--CGGCATTGA-- 



GCAGGAAATTACTCAGATTCITGGCTCCATGAATAATTTTAATGATC^^ 
600 610 620 630 640 650 

660 670 
ACTC TTACATC-^ AGAAAGTAG— 



# ♦ • 1 



TTOATAATTACTCATTTCTCAATAATCTTTTAATT^ 
660 670 680 690 700 710 

680 690 
AGCT GCC CAAGG— ATGTATCTGG 



TCCAAGCTCTTTAAATGGCCTTACAAACTCATTGGC^ 
720 730 740 750 760 770 

700 

AGAATTT GG ATGGT t C 



ACCTTTTAGTTTTTCCAGTGGGCCATGCCTATGGTAGT^ 
780 790 800 810 820 830 

710 

CTTC TGC : CTGGC 



CTTCCATCAATCTTGCATTCACATTCCCATCCCCTT^ 
840 850 860 870 880 890 

720 730 
— CTC- — T CGTCTC GCC -TC 



TTTCACCAATAGAGTCTGCCTGAAATGACACTCTTCTCATCACCTCCTAAAGAT^^ 
900 910 920 930 940 950 

740 

-CCTTA CACTTC 



TCCTTAAACCACTTCTCrrCCAACACTCACTCTTACAACATTCCCTCTCCAAACCCAGAT 
960 970 980 990 1000 lOLO 

750 760 
ATCCC - -CGCCGCT - - - - -CF CTTCATCTC 



ACCATCCT'JTGAAGTCCAOGCCACATCCACCTCTCCTCrrGTAGATGCTCCACCTCAAA'rC 
i:-ZO lOJO 1040 1050 1060 L070 
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770 780 790 
-GGCTGCCCACA CCAACCG-GAAAGAGTAC 



CCAAGCTAAGCrcCCAACTGACAGCCAACATCATTTCCAGCC^ 

1080 1090 1100 1110 1120 1130 

800 810 
ACCTTAA TGAAGGCTT ATC 

GGATGTCCAGCCTTAACAAGCCrrCAGAGGACTTCAGC^ 

1140 1150 1160 1170 1180 1190 

820 830 840 
GTGTGGC ATGAAGGG AGGCTG CCTG CT 

CCTTGTGAGACTCTAATAAAGAACCAACTAGCrcAGCCCAA 

1200 1210 1220 1230 1240 1250 

850 860 870 
TAATGATTAATATTTTT CATACATTTTTTT 

GAAATAAAATCAATTGTTGTrTTGTGCCGCTAAAAAAAAAAAAAAA^ 

1260 1270 1280 1290 1300 1310 



GGCGCCCCC 
1320 
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10 20 30 40 SO 

Hc^MA U GTCGACCCACGCGTCCGGCGGCTAGCXrCCGCGTGCGCTGGAGACCTCCGCGCTGCCC^^ 

• • •••••• • Va •«* 

Mi>Rj HC TCCG-GTCCAN-GAAAAAGCr-GCTTGCACrAGGGGCATCC-CGCCTC 

10 20 30 40 

60 70 ao 90 100 110 

. CGCGAGCCTCCTGCCCTGGCCCGCCGCTGCGGCTCTCCCGCG<^ 
• ■**••* 

TGAAAGGAACCG- -CAGCACACAGGGTGGGAGGCCTTCCG- - ATTTTAGCA-GGGCGGCT 
SO 60 70 80 90 100 

120 130 140 150 160 170 

••••••• * • ••• *•••••••• ■ *• ■ 

TCCGGAACGCGGAGCTC--CAACCCCATTTCCT--TTCTCTGGGCTXX3T^^ 

110 120 130 140 ISO 160 

180 190 200 210 220 230 

TGCATTTACAGGCTGACCCOGGGTCCCCGGCGGGGCGACCCCGAOCrc^ 

TCCACCTGCCTG 'TCGCCCTGGCTCCTCGGCT C -CCTGC - AGCTCCGAGGCAGCAGC 

170 180 190 200 210 

240 250 260 270 280 290 

TTCGAAGTC-CGCAGCTCCCCTGGAAGAAGGCACGTCAGAG- -GGTCAGTTGTCCGGCCC 

«••>■■• •••• saca aaaaaaa 

• • aaaa a aaaavaa • aaaa aaa **»a aaaaa aaaa a aa aaaa 

ATGGCTGGCCCCCGCCA - -CGTCGGCTGGGTGGCACCAGGGCrCCTCCrGCCCGCCGGCG 
220 230 240 250 260 270 

300 310 320 330 340 

CTCGGC - -C CGGCCT-CaAGACCCCAGGTACCTGGGAGTCACACTG -GTCCAAG - A 

C-CTGCTACTCTATCTACCGGCTCACTCGGGG-ACCGCGGCGAGGCGTCGCGACC.ATGCC 
230 290 300 310 320 330 

350 360 370 380 390 

CC- -TCGCAG-CC- -TGAAGACTTAACTCATGCTTCATATCATCATGTTCTAAATGCTCA 
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CCCTTCGCGATCCGCAGAAGACCTAACCGATGGCTCCTATGACGATATC^ 

340 350 360 370 380 390 

400 410 420 430 440 450 

. ACAACTTCAGAAACTCCTTTACCTCXrrGGAGTCAACGG^ 

GCAGCTTAAGAAACTTCTGTATCTGCTCXSAGTC^^ 

400 410 420 430 440 450 

460 470 480 490 500 510 

. AGCTTTGATTACTTTGGGTAACAATGCAGCC^^ 

^JJ »♦♦«•••• •«••••••■•• as ■ ••••••••• 

GGCCTTGGTCACCTTGGGAAATAATGCAGCCTTCTCC^ICTAACCAGGCCAT^^ 

460 470 480 490 500 510 

520 530 540 550 560 570 

ATTGGGTGGTATTCCAATTGTTGCAAACAAAATCAACCATTrc 

_***'*JJS2ZZ m 

GTTGGGTGGTATCCCAATTGTTGGAAACAAAATCAAC--TC^ 

520 530 540 550 560 

580 590 600 610 620 630 

GAGAAAGCTTTAAATGCACTAAATAACCTGAGTGTGAATGTTGAAAAT 

GAGAAAGCTTTAAATGCACTGAATAACCTGAGTCTGAATGTTGAAAATC^ 
570 580 590 600 610 620 

640 650 660 670 680 690 

AAGATATACATCAGTCAAGTATGTCAGGATGTCTTCTCTGGTCCTCTCAA 

; : X* 

AAGATATACGTCCCTCAAGTCTGTGAGGACCTCTTTGCTGAC 
630 640 650 660 670 

700 710 720 730 740 750 

CAGCTGCCTGGACTGACATTGTTGACAAACATGACTGTTACCAATGACCACCA^^ 
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T182 .hum.pep M^lWlQARVLVAAVVGLVAVlI*Y7|SIHKIEK^^ 

TX82.mi2S,pep MIMXJARU.VMVVGLVimXY?(SIHKIEEG 

•nSl . hum. pep MAQLGAWAVaSSFFCASLFSPjVHKTEECmiG^ 

T181 .mus • pep MAQMAVVAVASSFFOSLFSWHKIEEGHIGVYYI^^ 



T132 . hum. pep TLQTDEVKNVran^SGGWIIYiDRIEVVNMIJ^ 

T182 , mus , pep TLQTDEmiVPCGTSGGVItrniDRIEVV^^ 

T181 . hum . pep Tl/7WEm^JVFCGrSGG\/^ 

T181 .mus -pep TLQTD£\^(MVPCGrSGG\/KnCTRIEVV^ 



T132 .hum.pep HTLQEVVIELFIXJIDQJLKQALQKDLIOIA^ 

T182 .mus . pep KTLQEVYTELFDOIDE^lriOQALQKDLN^^ 

TISl . hum.pep hTWEVYIELFIXJIDEWLKLAL^ 

TI31-mu3.pep HTLQEVVIELFIXJIDENUCLALQQDLTSM?^^ 



T182 . hum* pep XCKQKVVEKEA^RKKAVIEAEKIAQVAKIR^ 

T132 . mus . pep AQKQKVVEKTAETEI^KRAVIE^VEKIAQVA^ 

T131 . hum . pep AQKQKWEKE?^£TEIiK^^ 

T18 1 . mus . pep AQKQKWEKEAETERKIC^LIEAEKVaQV^^ ... 



T182 . hum.pep YAAHK^aTSNKHKLTPEYLEIJCKYQALAS^E 

T132 .mus . pep V.:WiKYATSNKHKLTPEVLEIiCKYQAIASN^ 

T18 1 . hum . pep VT.^MKIAEANKUCLTPEYLQLMKVKJ^IASN^ SI«IFBGLADK 

C42CI . a VK^KQADSNKILLTKEYLELQCTRAIASN^ -MCTTQQTV 



TI32 . hum-pep EALEPSCE;^iVTO— NKESTC 
T132 . mus . pep EAHEPSCESPIQ— NKQttC 
T13 1 . hum. pep LSFCLE-DEPLEtATKaj 



flGiMl 



i > 
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10 20 30 40 SO 60 

inputs MATLWGGLUUiGSIJiSLSCSALSVIXUUiliSDAAlQIFEDVRCKCICPPyKEH 

LLSLVAW- -GCL LVPFAEANKSSEDIRCKCICPPYRKISGHIYNQN 

10 20 30 40 

70 80 90 100 110 120 

iitpucs ISQKDCZXXHVVEPMPVRGPDVE;iyCIJtCEClCyEBI^SVTI!^ 

VSQKDCNCXHVVEPMPVPGHDVEAyCIXCECRYEERSTTTlICVTIT^ 
50 60 70 80 90 100 

130 140 ISO IfiO 170 180 

inputs YLTLVEPItlOmLFGHT^ILIQSDDOXaOIIQPFANAHDVXJ^ 

. . : : . : • • s . . : . : . : , . : - . . : s • • s s s s ; j ; s 

FliMLVDP-LXRKPDAmQUIMEEENEDARSHAAAAASLW 

110 120 130 140 ISO 160 

190 

inpucs LQVQEQRKSVPDRKWLSN 
•••■■■■•«■>■** 
LQVQEQRKTVFDRMKMLSN 
170 180 



F16. 43 
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10 20 30 40 50 50 

inputs MASLHCGNLIJaiGSOLSMSCXALSVIJXAQLT^^ 

: .:: ; . . •:: • 

N KUOiVAW- -GO* LVPPAQANKSSEDIRCKCICFPySHZSGHZiarQ 

10 20 30 40 

70 80 90 100 110 120 

inpuCS NlSQKDCDCIJnn^EPMPVRGPDVEAYCXRCEGKYEERSSV^ 

NVSQKDCNCLKVVEPMPVPGHDVEAYCLLCECRyEERSTTTil^ 

50 60 70 80 90 100 

130 140 ISO 150 170 ISO 

inputs VyL1i:.VBPi:UCRIU.raaSQIXQSDDDVGDRQPFAIKAin3^^ 

«.s « • • • « ••• ■•■■■>>> 

Ara^LVOP-LIRKPOAYTEOLHIIEEEIIfiDARTI^^ 
110 120 130 140 150 150 

190 200 
inputs KLQVQEQRKSVFDRHWLSK 
•■•«***•••••■•> ••»■ 
lOKTVQEQRKTVFDRHKmiSK 
170 180 
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input file riSThunanl; Output File T167hUBMl.pat 
Sequence length 2490 

TTCCCGCCCCCGGCGTCTCCCan^CGGCCCCMXGTCCGACCCCXCCCTCCCGCTCTG^ I5S 

CTCGCCTGGGAGAAGCCGCCCGGACGCCCCGCCCTCGACTGGGCGGTTArAGGCTTTGAGCrAGGCCGnTCOSGG^ 237 

CGCAGCTCAGACCCCATTTCCTrTCTCCACArcCACGrCAGGTCGCCTTTGCTGTGGCGGCTACGCCCGCCTGCGCTGG 316 

N 6 2 

AGACCTCCCCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCCGCCCTGCGGCTCTGCCCCGCCGGCACC AT6 G6T 391 

GPRGAGVVAAGlLlGAGACr 22 

GGC CCC CG6 G6C GCG GCC TGG GTG GCG GCG G6C CTG CTG CTC GCC GCG GCC CCC TGC TAC 45 f 

CIYRLTRGRRRGORELCIRS 42 

TGC ATT TAG AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CCC GAG CTC GGG ATA CGC TCT 511 



SCSAGALEeGTSEGOLCGRS 62 
TOG AAG TCC GCA GGT GCC CTG GAA GAA GGG AC6 TCA GAG GGT CAM TTC TGC GGG CCC TCG 57t 



ARPQTGGTUESQUSICTSxPE 82 
GCC CGC CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAN CCT GAA 63t 

DLrDGSY00VLHAEOLOrLL102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

TLLeSTeOPVIIERALITtCl22 
TAC CTG CTG GAG TCA ACC GAC GAT CCT CTA ATT ATT GAA AGA GCT TTC ATT ACT TTC GGT 751 

NNAAFSVNQAt IRELGGIPIU2 
AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT AfT ATT CCT CAA TT6 GCT GCT ATT CCA ATI ail 

VANKINNSM0$||CE1CALNA1162 
CTT GCA AAC AAA ATC AAC CAT TCC AAC CAG AGT ATT AAA GAG AAA GCT TTA AAT GCA CTA 871 

NNLSVIfV£N0IICIICVQVLICLia2 
AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG CTG CAA GTT TTC AAA CTG 931 



ILNLSENPAHTEGLLRAQVO202 
CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG CAT 991 



SSFLSLT0SHVAJCEILLRVL222 
TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC CTA GCA AAG GAG ATT CTT CTT CGA CTA CTT 1051 

rLFONtKHCLICIECHLAVOP242 
ACC CTA TTT CAG AAT ATA AAG AAC TGC CTC AAA ATA GAA GCC CAT TTA GCT GTG CAG CCT 1111 

trT6CSlFFLLHCE£CAOlCl262 
ACT TTC ACT GAA GCT TCA TTC TTT TTC CTG TTA CAT GCA GAA GAA TCT CCC CAC AAA ATA 1171 

RALVOHHOAEVICEKVVTI fP2S2 
AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT CTA ACA ATA ATA CCC 1231 

» • 255 
AAA ATC TCA ]240 

rTGGrCATArTTTTCCAAAGAGTAATGCAGTCrGCATArAAATGTATrrTCTGTCTTCCTTATAAGCCCArrCTCCCAG 1319 

CrCCTAAArTrAAACAGTAAATATCACATTTTCTCATTAACACAGCTATAACTTGCCCTCCTTCTCACATTTATTTTCC 1398 

ACTATTTTGATGCCAAGTCAATATAACACCTTCTACTCAAACCATTTArTTCTTTCTATTTTGCTATriGCAAArCCTT 1477 

GTTATCTTCCCTACATCAAGrCGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTCAAGTGATTTGCACTT 1556 

ACTCATCTCACACAGCATCACTATTTCACTAAAfCATTCTTTCACAACTCAATACTCTTCTTCTTTTACTACCAATGAA 1635 

ATCCTAACCTCTTCACCCCATTCACCTCCCAACCTCACCATACTCCTTTCAAAAGTCTTTTCTCArCAGTAGAATCTAr 1714 
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mCGTCACTTCTACTCAATCAAAAATGTAMCTTTTAGWGACMTCTTTCCTACW 1793 

TACATATAAiWrAGTCTCATCMTCACAATCTCCArcmAiUCACTTCCTTAAATAM 1672 

CCGTGCTCCGCCCXCTGCCTCTTGCCTGTAATCrCAGCACTTTGGGAGGCrCACCCGGGCAGArCACXT^ 1951 

GmGAGACCAACCCTacCAArATCGAGAAACCCrCTCTCTACrAAGAATAC^ 2030 

GCCr6TAArCCCAGCTACTTGGGAGCCaUGCCAGCA(MATT(XTTMCCCCCGACC« 2109 

ATAarCCCATTCCACTCCAGCCTGCGCAACAACAGCAAAACTCTGTCTCAAAAAAAA^ 2188 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGCTCGTnnTAAAACACAAAAAmrAGAATATGGM 2267 

TGTOTCTGTCTGTGTGTGTCTGTCTGTGrCTGTGTCTGTGTCTTTCAATGAAAAATGCTTATCTATTGACAGAACACTT 2346 

CTACAATGATACCCAAACTCCTCCAGTGCCAGTCCCGAATCCCTTCTACCTACACACTCTTCTACTCnTGAAm^^ 2425 

AArATGACCCCAAATTCTArAArCTrrTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2490 
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Cotanlnput file T187hunanZ3; Output FfU nSThusanZS.pat 
Sequence length 2595 

CCACCCGTCCCCCCA(»CKCTCGACGCA«yUTCCTTtt^^ 79 

n(^scccccccccTCTccca;Taa;ccccAca»T(xcAccc ccc c c T cax CT tsa 

CTClSXTGGCAGAAGCCCCCCGCU(XC(X(XGCCTGGACTCGGCf»TTATACGCmM 237 

CCCAGCTCACACCCCATTTCCmCTCCACATCCAGGTCA<»T6CCGTrTGCT(n^GGCCCCTACGCCC^ 316 

N G 2 

AGAfXTCCCCGCrCGCCCCCGCGAGCCTCCTCCOTGGCCCGCCCCTGClUSCTCTGCCGCCGC ATQ GST 391 

GPRGAGUVAAGLLL6ACACY 22 
GCC CCC CCG GGC CCC CCC TGG 6TG GCG GCG C6C CTG CTG CTC GGC CCG GCC GCC TCC TAC 451 

ClYRLTRGftRRGO RELGIRS 42 
TGC An TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CCC GAC CTC GGG ATA CGC TCT 511 

SKSAEDLTOGSYODVLItA EQ 62 
TCG AAC rCC CCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LOKLLYLLCSTEOPVIIERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT 6TA ATT ATT GAA AOA GCT 631 

I.ITLCNNAAFSVIIQIPN1CIV102 
TTG ATT ACT TT6 GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT AT6 AAG HG GTC 691 

TGITFAtfRElGGIPIVANICl22 
ACT CGC ATC ACA TTC GCT ATT ATT CCT GAA TTG CCT CCT ATT CCA ATT CTT CCA AAC AAA 751 

lttHSII0$tKEKAtllALNNL$142 
ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA GCT TTA AAT CCA CTA AAT AAC CTG ACT 811 

VttVENO|KllCIYlSOVCE0V162 
CTG AAT GTT GAA AAT CAA ATC AAG ATA AAG ATA TAC ATC ACT CM GTA TCT GAfi GAT CTC 871 

FSGPLIfSAVQLA6L TltrNNl82 
TTC TCT GGT CCT CTG AAC TCT GCT GTC CAC CTG GCT GGA CTC ACA TTG TTG ACA AAC ATC 931 

TVTilDNQHNLN8YITDLPOV202 
ACT CTT ACC AAT GAC CAC CAG CAC ATG CTT CAC ACT TAC ATT ACA GAC CTG TTC CAG CTG 991 

LLTGNCNrKVQVLICLLt.NLS222 
KTA CTT ACT CCA AAT CCA AAC ACG AAC GTC CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT 1051 

6NPANrEGLLRA0V0SSfLS242 
CAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CCT CCC CAA GTG GAT TCA TCA TTC CTT TYC 1111 

LY0SHVAKEILLRVLTLFQM262 
CTT TAT CAC ACC CAC CTA CCA AAG GAC ATT CTT CTT CCA CTA CTT ACC CTA TTT CAG AAT T171 

I KMCLK t SGH LAVOPTF TEC 282 
ATA AAG AAC TCC CTC AM ATA CM CGC CAT TTA GCT GTC CAG CCT ACT TTC ACT CM GCT 1231 

SLFFtLN6EECAQKIRAtV0 302 
TCA TTG TTT TTC CTG TTA CAT GGA CM CM TGT GCC CAC AM ATA ACA GCT TTA GTT GAT 1291 

HHOAEVKEKVVTIIPKI* 320 
CAC CAT GAT CCA CAC GTC MC CM MC CTT CTA ACA ATA ATA CCC AAA ATC TCA 1345 

TTCCTCATATTTTTCCAAACACTMTCCACTCTCCATArAAATCTATTTTCTCTCTTCCTTATMCCCCATTCTCCCAC 1424 

CTCCrAMrrTAMCACTAMTATCACArTTTGTCArTMCACACCrATMCTTCCCCTCCTrCTCAGATTTATTTTCC 1503 

ACTATTTTGATCCCMCTCMTATMCACCTTCrACrCAMCCArTTATrTCTTTCTATTTrCCTATTTCCAMrcCrT 1582 

GTTATCTTCCCrACArCMCTCCCACTMCCTTTTTCACArTTMCCTACCCTTCTACCTTTTGMGTCATTTGCACrT 1661 

ACTCATCTCACACACCATCACTATTTGACTAMrCArTCTTTCACAACTCMTACTCTTGTTCTTTTAGTAGCMTGM 1740 

ArCCTMCCTCrTGACCCCATTCACCTCCCAACCTGACCATACTCCTTTCAAAACTCrTrTCTCArCACTACMrCTAr 1819 
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TTTGGTCACTTCTJUn'CAATIMAAAAT6TAMCTmAG(MIMCMTGmCCT^ 1898 
TACATATiUAArACTCTUTCAATCACMTinmrcnTAGAOim 1977 

OXT6TAArCCCA(XTACTT(UUUG(U:CGACGCAGGAGAATT(XTT<MACCCGCGAC 22U 

ATAGCGCCATTGCAacCAGCCTGGCCAACAACAGCAAAACTCTGTCTCAAAAAAAAAAAAA^ 2293 

TGTGCTTAAGTGGAAAGATATCTArGAAATATGCTGGmnTAAAACACAAAAAmTAGAATATGGGATCCCCTG^^ 2372 

TGTGTQTGTGTGTGTCTGTCTCTGTGTarGrGTGTGTGTGTGnTGAATGAAAAATGCTTATCTATTGACAGAACACn 245 1 

CTAGAATGArACCCMACTCaGGAGTGGGACTGGCGAATCCCncrACGTACACACTGTTCTACTGm 2S30 

AATATCACCCCAAATTGTATAATCnTnnAATAA A C G GCAGMAAATCAAAA A AAA A AAAA A A 2595 
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Input fUo n87huRan123; Output File ri87huan123.pat 
SeQuenee length 27D0 

AC6GAGCAATCOTCCnCACCCCCCGCG6GAAGA6AC(U»^^ 79 

CTaCCCCCTCCCCgCCACCCTCaaCCCCCCCCTCCOTTCr<XW 158 

CTCCCCTC6GAGAACCC6CCCC(UCCg6CCCiaMrrGGACT6CagGCT 237 

CC(U(»TCAGACCCUTTTCCnTCTCCIICATCCAC6rCAGGTGIXin'TTC^ 316 

M 6 2 

AGA(XrcC6CCCT(XCCCCCCC6A6CaCCTGCCCTCaXCCGCIXTGC6GCTCTGCC^ ATG GGT 391 

GPRGAGUVAAGLLLGAGACY 22 

GGC CCC CCG GCC GCG CGC TGG GTG CCC GCG GGC CTQ CTG CTC GGC CCG GCC GCC TGC TAG 451 

CIYRLTR6RRRG0RELGIRS 42 

TGC AH TAC AGC CTG ACC CG6 GGT C6G CCG CGG GGC GAC CCC GAG CTC GGG ATA CGC TCT 511 



SKSAGALEEGTSEGQLCGRS 62 
rCG AAG rCC GCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



AIKPOTCGTUESOUSICTSOPE 82 
GCC CGG CCT CAC ACN GGA GGT ACC TGG GAG TCA CA6 TGG TCC AAA ACC TCG CAN CCT GAA 631 

DLTDGSVOOVINAEOLQICLL 102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT 6CT GAA CAA CTT GAG AAA CTC CTT 691 

YLLESTEDPVIIERALITLG 122 
TAC CTG CTG GAG TCA ACG GAG GAT CCT 6TA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

NNAAFSVNQIPNICIVTGITF 142 
AAC AAT GCA CCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG 6TC ACT GGC ATC ACA TTC 811 

AIIRELGGIPl VAMlCtNHSII 162 
GCT ATT ATT CGT GAA HG GGT CCT ATT CCA AH GTT GCA AAC AAA ATC AAC CAT TCC AAC 871 

QSIKeKALVAlNNLSVIIVEN 182 
CAG AGT ATT AAA GAG AAA GCT HA AAT GCA CTA AAT AAC CTG ACT GTG AAT GTT GAA AAT 931 

QIKIKIYISOVCEDVFSGPL 202 
CAA ATC AAG ATA AAG ATA TAC ATC AGT CAA CTA TGT GAG GAT GTC TTC TCT GGT CCT CTG 991 

NSAVOLAGLTLLTNHTVTNO 222 
AAC TCT GCT GTC CAC CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC 10S1 

HQHNlNSYtTOLFOVLLTGN 242 
CAC CAG CAC ATC CTT CAC AGT TAC ATT ACA GAC CTG TTC CAG GTC KTA CTT ACT GGA AAT 1111 

GMTKVOVIICLLLNLSENPAH 262 
GGA AAC ACG AAG GTG CAA GTT TTC AAA CTG CTT TTC AAT TTG NCT CAA AAT CCA GCC ATG 1171 

TECLLRAOVDSSFLSLYDSH 282 
ACA GAA GGA CTT CTC CCT GCC CAA GTG GAT TCA TCA TTC CTT TTC CTT TAT GAC AGC CAC 1231 

VAKEILLRVLTlfQNlKHCL 302 
CTA GCA AAG GAG ATT CTT CTT CGA GTA CTF ACC CTA TTT CAG AAT ATA AAG AAC TCC CTC 1291 

ICIEGNLAVOPTFTEGSLFFL 322 
AAA ATA GAA GGC CAT TTA GCT CTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTC 1351 

IHGEECAQICIRALVOHHOAE 342 
TTA CAT GCA GAA GAA TGT GCC CAG AAA ATA AGA CCT TTA GTT GAT CAC CAT GAT CCA GAG 1411 

VICEKVVTIIPICI* 355 
GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TCA |4S0 

rTGCTCATATTTTTCCAAAGAGTAATGCACTCTGCATATAAATGrATTTTCTGTCrTCCTTArAAGCCGATTCTCCCAG 1529 

CTGCTAAATTTAAACACTAAATATCACATTTTCTCATTAACACACCTArAACTTCCCCTCCTTCTCAGATTTATTTTCC 1608 

ACrArrTTGArGCCAAGTCAArArAAGACCTTGrACTGAAACCATTTATTrCTTTCTArTTTCCTArrTGCAAArCCTT 1687 
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GTTATCTTCCCTACATGAAGTG6CAGTiUCCTrmCACATTT/UGCTACCCTTCTACCrTT76M0T(MTTTGCAGTT t766 
ACTCATaCAiMCACCAraGTAmGWCTAAArCAnGTnCAOUCTGMTAGTCnCTTCT^ 184S 
ArCCTAA(XTmCA<a«CATTCACCTCCCAACCTaACCATAaCCTnc^^ 1924 
TTTGinCAmCTA6TCAATGAAAAArCTAAAaTrTAG(MGAGAATGTTTCCrA6(MCTC^ 20Q3 
rACATATAAAArAGTGTCATCAATCACAArCTCCATCTTTAIMCAGTTGCTTAAATAAAmrcrGGTC^ 2082 
CCGTGCTCCGCCCCGTGGCTCTTGCCTCTAATCCCAGaCTTTGGGAGGCTGAGCCGGGCACATCACCTGAGArCGGGA 2161 
GmGAGACCAAGCCTGACCAATATCGAGAAACCCrGTCTCTACTAAGAATACAAAATTAGCTCGGCATGGTa^ 2240 
GCaGTAArCCCAGCTAaTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGCCAGACGTTGCAGTGAGGTGAG 2319 
ATAGCCCCATTGCACTCCACCCTCCCCAACAACAGOUUACTCTGTaCAAAAAAAAAAAAA^ 2398 
TGTGCnAAGTCGAAAGATATCTATGAAATATGGTGGmmAAAACACAAAAAnArAGAATATGGGATC^^ 2477 
rGTGTGTGTGTCTGTGTCTGTGTGTGTGTGrGrGTGTGTGTGmGAArGAAAAATGCTTATGTATTCMCAGAAU 2556 
CTAGAArGArACCCAAAaCCTCGAGTGGGACTGCCCAATQCCTTCTACCTACACACTGTTCTACTGm 2635 
AATATGAGCCCAAATTGTATAATCTTTnTTAArAAACCGCA G A A AAATCAAA A AAAAAAAA A AA 2700 
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Ir^ file T187hunn12; Output File TISTKunnlZ.pat 
Sequence length 2523 

caicfiCCTCCCccauKaiCCCCQMasQAcw 79 

TTCClttCCCCGCCGTCTCCGCGTIKCCCIXACCQTCCGACC^^ 158 

CTCCCCTGGGAGMGCCCCCCaSACGCCCCGOXTGCACTia^^ Z37 

CCGACCTCAGACCCCATTTCCmCTCCACATCCACKSTCAGGTGtXCTnGa^^^ 316 

H 6 2 

AGmTcascQcr&sccccoiccAfuxJcaccm atg ggt 391 

GPRGAGWVAAGLLLCAGACY 22 

GGC CCC CGG GGC GCG GCC tCG GTG GCG GCG GCC CT6 CTC CTC GCC CCC CGC GCC TGC TAC 451 

ClYRLTRGRftllQOllELGtRS 42 

TGC ATT TAC ACG CTC ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GG6 ATA CGC TCT 511 



SKSAGALEEGTSEGOLCGRS 62 
TCG AAG TCC CCA GGT GCC CTC GAA GAA GGG ACG TCA GAG GGT CAM HG TGC CGG CGC TCG S71 



ARPQTGGTU ESOUSKTSXPE 82 
GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGC TCC AAG ACC TCG CAN CCT GAA 631 

DLT0GSY00VLNAE0ia)CLLl02 
GAC TTA ACT GAT CCT TCA TAT CAT GAT GTT CTA AAT Ca GAA CAA CTT CAG AAA CTC CTT 691 

YLLeSTEOPVt IERAltTLGl22 
TAC CTC CTG GAG TCA ACG GAC CAT CCT GTA ATT ATT GAA AGA CCT TTC ATT ACT TT6 GGT 751 

||ttAAPSVN0|PlflCLVTG|TFU2 
AAC AAT GCA GCC m TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC ACT GCC ATC ACA TTC 811 

AI fRELGGIPIVANICINH$tt162 
CCT ATT ATT CGT GAA HG GGT GGT AH CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC . 871 

QSC1CEICAL1IALNIILSVMVEN182 
CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT GTG AAT GTT GAA AAT 931 



01IClKVaVL)CLLLNLSEMPA202 
CAA Arc AAG ATA AAC GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC 991 



NrEGLLRAOV0SSFL$LY0S222 
ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT TAT GAC AGC 1051 

HVAICEILLRVLTLFQMI)CNC242 
CAC GTA CCA AAC GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT ATA AAC AAC TCC 1111 

LICtEGHLAV0.PTFTEGSL FF262 
CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC 1171 

LLHC£ECAOieiRALVDNHI>A282 
CTG TTA CAT CGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA 1231 

EVKEKVVTIIPICI* 296 
CAC GTG AAG CAA AAG CTT CTA ACA ATA ATA CCC AAA ATC TGA 1273 

TTCCTCATATTrTrcCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTaGTCTTCCTTATAAGGGGATrCTCCCAG 1352 

CrCCrAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTrATTTTGG 1431 

ACTATTTTGArGCCAACTGAATArAAGAGCTTCTACTGAAACCATTTArTTCTTTCTATTTTCCTATTTGCAAATCCTT 1510 

CTTArCTTCCCTACArGAACTCGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCrTTTGAACTGATTTCCAGTT 1589 

ACTCArCTGAGACAGCATCAGTATTTGACrAAATCATTGTTTaCAAGTGAATAGTaTGTTCrTTTACrACCAATCAA 1668 

ArCCTAAGCTCTTGAGGCCArrCACCTCCCAAaTGACCATACTCCTTTCAAAACTCTTrraCATCAGTACAArCTAr 1747 



FI6. M') ( 
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TTTG6TCACTTCTACrCMrGAAAMT6TAMCTTnAGIUQUMATGmCCTA(UUCTU^ 1626 
TACATATiUWATACTCTCUTCAATCACAATGTCCATCTTTACACACTTCCnAAATAAATTATCTtOT 1905 

CCCTGCTC(aCCC(WT(KiCTCTTCCCTCTAATCCCA(Xyu:TTT(K(W\CCCTCAGCCGGG 1984 

QTnGAGACCAACCCTGACCAATATGGAGAAACCCTCTCTCTACrAAGAATACAAAArTAGCTGCKCATGGTGCTGM 2063 

GCCTGTAATCCCACCTACTTCGGAGGCCCAGGCAGGAGAArTGCnGAACCCGCGAGCCACAGCTTCCACTGACGTGAG 2142 

ATAGCIXCATTGCACTCCACCCTCGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAArGATGM 2221 

TGTeaTAAGTGGAAAGATATCTArGAAArArCCTCCnrTTTAAAACACAAAAATTATAGAATATGG^^ 2300 

TGTGrfn^GTCTGTCrGTGTGrGTGTCTCTGTGrGTCTGTGrGTTTGAATGAAAAATGCTTATOTArTCACAGA^ 2379 

CTAGAATGATACCCAAACTCCTGCAGTGGGAGTGGCGAATGCCTTCTACGTACACACTGrTCTACTCTTTGAATTTTTT 2456 

AATATCAGCCCAAATTCTATAATCnTTTTTAATAAAGGGGACAAAAATCAAAAAAAAAAAAAAA 2523 
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ti^ut file TtSThtMnZ; CXitput Fil* ThUMnZ.pat 
S«quence length 2410 

CGMGCTCAGACCCCATTTCCTnCTCCACATCCAGGTOlCCTOMm 316 

N G 2 

AGACCTCCCCCCT(»CCCCCCCCMfXCTCCTCCCCT(»XCCC(»CCTGC6GCTCTI^^ ATO 6GT 391 

GPRGAGUVAAGLLLGAGACY 22 
GGC eCC CGG CCC GCC GGC TG6 GTG GCC 6CG GCC CTC CTG CTC GCC 6CG GGC GCC TGC TAG 451 

CtYftLTRGRRRGORELGIftS 42 
TGC ATT TAG ACC CTG ACC CGG OCT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 

SICSAE0LTD6SY0OVLNAE0 62 
TCG AAG TCC CCA GAA GAC TTA ACT CAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

tOKLLYLLESTEOPVIieftA 82 
CTT GAG AAA CTC CTT TAC CTG CTG GAC TCA ACQ GAG GAT CCT GTA ATT ATT GAA A6A GCT 631 

LITLGIIMAAPSVIIOIPKICLV102 
TTG AH ACT JJQ GGT AAC AAT GCA GCC VfT TCA GTT AAC CAA ATC CCT ATG AAG HC CTC 691 

TGtrFAIIREl6GtPtVANIC122 
ACT GGC ATC ACA TTC GCT ATT ATT CGT CAA TTG GGT GGT AH CCA ATT GTT GCA AAC AAA 751 

IIIHSIIQ$IICEKALNALNIIIS142 
ATC AAC CAT TCC AAC CAG AGT ATT AAA GAC AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT 811 

VNVEMQIICIICV 0VLKLLLNL162 
GTG AAT CTT GAA AAT CAA ATC AAG ATA AAG GTG CAA CH TTG AAA CTG CTT TTC AAT TTG 871 

SENPAMTEGLIRA0V0SSF1182 
NCT GAA AAT CCA GCC ATG ACA CAA GCA CTT CTC CCT GCC CAA CTG GAT TCA TCA TTC CTT 931 

SLYI)SHVAfCEILlRVLTLFQ202 
TTC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG 991 

|||ICllCllClEGHtAVOPTFTE222 
AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT HA GCT GTG CAG CCT ACT TTC ACT GAA 1051 

GSLFFlLHG£ECA0ICrRALV242 
CCT TCA TTG TTT TTC CTG TTA CAT CGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA CTT 1111 

OHHDAEVICEICVVTriPICI* 261 
GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT CTA ACA ATA ATA CCC AAA ATC TCA 1168 

TTCGTCATATTTTTCCAAAGAGTAATCCACTCTG6ATATAAATGTATTTTCTGTCTTCCTTATAAGGCGATTCTCCCAC 1247 

CTCCTAAATTTAAACAGTAAATATCACATTTTCTCATTAACACAGCTATAACTTCCCGT6GTTCTCAGATTTATTTTCC 1326 

ACTATTTTCATCCCAAGTCAATATAAGACCTTGTACTCAAACCATTTATTTCTTTCTATTTTCCTATTTCCAAATGCTT 1405 

GTTATCTTCCCTACATCAAGTCCCAGTAACCTTTTTCACATTTAA6CTACCCTTCTACCTTTTCAA6TGATTTGCAGTT 1484 

ACTCATCrGAGACACCATCAGTATTTGACTAAATCATTGTTTCACAACTGAArAGTCTTGTTCTTTTAGTAGCAATGAA 1563 

ATCCTAAGCTCTTGACGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCArCAGTAGAATCTAT 1642 

rTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTCCTAGGACrCACCCACTCCATTCAATGT 1721 

TACArATAAAATAGTCTGArCAATCACAArGTCCATCTTTAGACAGTTGCTTAAArAAATTArCTGGrCTTTGAAAAGA 1800 

CCGTCCTGGGCCCGGTCGCrCTTGCCTGTAArCCCACCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCCCGA 1879 

GTTTGAGACCAAGCCTGACCAArATGCACAAACCCTGTCTCTACTAAGAATACAAAArTAGCTGGCCATGGTGGTGCAT 1958 
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GCCTCTMTCCCACCTACnCCCACIXClUCttAgSACMnCCTTCAAC^ 2017 

AJtiSiasaiATrCCACTCCAGCaCCCIM 2116 

TCTCCTTMGTCCAAACATATCTATCMAATATGfiTGCTTTTTTAAAACACAAi^ 2195 

TGTCTCTGTCTCTCTCTGTCTCTGTCTCTCTCTCTCTGTCTCmCAATttUUUATCCTTATCTATTW 2274 

CTA6AATGATACCCAAACTCCTGGAGT(UUMGTGCGGAATGCCTTCTACGTACACACTGnCTACTGTn 2353 

AATATGACCCCAAATTGTATAATCTTTTmAATA A AGCG G ACAAAA A TCAAAAAAAAAAAAAA A 2418 
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Inpoc fUe ri87h«Mn3; <Xitpiit Fite Tt87hunan3.pat 
Sequence length 2562 



CCACCCCTCCC6CCAGCCCCCCGAGG6A6GM t CgTTQCTTCAl 




79 



nCCGCGCCCCCGCGTl 




g CCCCC CCTCCCCCTCTCCACCCCI 



CTCKCTGCGMUAI 



CQAGTG66CQCTTATiU»xm(»GCTAIUXan'TTCC(»SAG6 ZS7 



CG(UGCrCA(»CCCCAnTCCTTTCTCCACATCCAGGTCAC6rGCCGmGCTGTGGCG» 316 



6PRGAGWVAAGLLLGAGACY 22 
GGC CCC CGC GGC GCG GGC TGG GTG GCG GCG GCC CTG CTG CTC 6GC GCG GGC GCC TCC TAG 451 

CIYRLTRGRRRGOREIGIRS 42 
rCC AH TAC AGG CTQ ACC CGG GGT CCG CCC CGG GCC GAC CGC GAG CTC GGG ATA CGC TCT 511 

SKSAEDLTOGSYOOVLMAea 62 
TCG AAG TCC GCA CAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT CAA CAA 571 

LOKLLYLLeSTeOPVIIERA 82 
Cn CAG AAA CTC CTT TAC aC CTG GAG TCA ACQ GAG GAT CCT GTA ATT ATT GM AGA GCT 631 

LITLGMMAAFSVNOAl rRELl02 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA CH AH ATT CGT CAA TT6 691 

GGlPIVAIIKtllN5NQ51ICEJC122 
GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT AH AAA GAG AAA 751 

ALNA^NIILSVNVENQI ICI KI 142 
GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CM ATC AAG ATA AAG ATA 811 

YtSOVCE0VFSQPL8$AVOL162 
TAC ATC AGT CAA GTA TGT GAG GAT GTC TTC TCT GGT CCT CTG AAC TCT GCT GTG CAQ CTG 871 

AGLTLLTNMTVTIIDH0HNLN182 
GCT GGA CTG ACA HG TTG ACA AAC ATG ACT GTT ACC AAT GAC CAC CAG CAC ATG CTT CAC 931 



$YrT0lFQVLlTGHGNTlCVQa02 
AGT TAC ATT ACA GAC CTG TTC CAG CTG KTA CTT ACT GGA AAT GGA AAC ACC AAG GTG CAA 991 



VLKLLLNLSEIIPAHTEG.ILR222 
GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA CAA GGA CTT CTC CGT lOSi 



AOVDSSFt$LYOSHVAICEIt.242 
GCC CAA GTG CAT TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC GTA GCA ilAG GAG ATT CTT 1111 

LRVlTlFGIIIICNCLXtEGHl262 
CTT CGA GTA CTT ACC CTA TTT CAG AAT ATA AAC AAC TGC CTC AAA ATA GAA GGC CAT TTA 1171 

AVQPrFTEGSLFFLlHCEEC2a2 
CCT CTG CAG CCT ACT TTC ACT GAA CGT TCA TTG TTT TTC CTG TTA CAf GCA CAA CAA TCr 1231 

AQKtRALVOHHOA£VICEICVV302 
GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA CAG GTG AAG CAA AAC GTT GTA 1291 

T 1 I P K I • 309 
ACA ATA ATA CCC AAA ATC TGA 1312 

TTGGTCArATTTTTCCAAAGAGrAArGCACTCTGCATATAAATCrATTTrCTGTCTTCCrTATAACGGGArTCTCCCAC 1391 

CTGCTAAATTTAAACAGTAAATArCACATTTTGTCATTAACACAGCTATAACTrCCCGTCGTTCTCAGATrTATTTTCC U70 

ACTATTrTCATGCCAAGTGAArArAACAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTCCTATTTGCAAATCCTr 1549 

GrTATCTTCCCTACATCAAGTCGCAGTAACCrTTTTCACATTTAAGCrACCCTTCTACCTTTTGAACTGATTTGCACTr 1628 



AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTI 




M G 
ATG GGT 



2 
391 



ACTCATCTGACACACCATCACTATrrGACTAAArCArrCTTrCACAACTCAATAGTCTTGrTCTTTTAGTAGCAATCAA 17D7 
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ATCCT/UCamcUUUXOiTTCACCTGCCMCCTGACCATACTGCmCAAA^ 1786 
mi»m:ACTTCTA(n'CMrGAAAAArGTAAACmTAIXUGAGAATGmCCT^ 1365 
TACATATiUMTACTCTtUTCMTCACAATCTCCATCnTACAa(nTCCTT/UATW^^ 19U 

can'GCTGCGC6CGCTGGCTCnCCCTGTAATCCCACCACnTG(»M»:crGAGGCGG(^ 2023 

GTnGAGACCAAGCCTGACCAATATCCAGAAACCCTGTCrCTACTAAGAATACAAAATTAGCTGCGM 2102 

GCCTGTAATCCCACCTACTTGGGACGCCSAGGCAGGAIMAnGmGAACCCGCGAaUA^ 2181 

ATACCCCCAnGCACTCCA(aCT6CCCAAaU C ACCAAAACTCTCTCTCAA A AAAA^ 2260 

TGTGCTTAAGTGGAAAIMTATCTATGAAArATGGTGGnTmAAAAaCAAAAATTATAGAATATG^^ 2339 
rGTGrGTGTGTGTGTGTQTGTGTCTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2418 
CTACAATGATACCCAAACTCCTCCAGTGGCACTGGCCAATGCCTTCTACCTACAaCTCTTCTACTGmGAAnTTn 2497 

AATATCAGCCCAAATTGTATAATCTTmTTAATAAAGCGGAGAAAAATCAAAAAAAAAAAAAAA 2562 
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PCT/US99y228I7 



Input file Tl87?HMn; Output FiU T187hunn.pttt 
sequtnct length 2365 

n'f m rmwftCMGcccCT'cmi/^cfiCTfCy-^'^T 237 

CGGA(XTCAGACCCCATTTCCTnCTCaCATCCA(»TCA(»T(»CGmCCTm 316 

N G 2 

A(UCCTCCGCGCT6GCCCCC6CCA6CCTCCTGCCaGGCCCIHiCCCTCCGGCTCrGCCG ATG 6GT 391 

GPR6A6UVAAGLLLGAGACY 22 
66C CCC CGG GGC GCG GGC T6G 6T6 GCG GCG GGC CTG CT6 CTC GCC 6CG 6GC 6CC TCC TAC 451 

CIYRLTRGARRGO RELGIRS 42 
TGC ATT TAC AGG CTG ACC CGG G6T CCC CGG CGG GCC GAC CCC GAG CTC GCG ATA OGC TCT Sti 

SKSAEDLTOGSrOOVLNAEO 62 
TC6 AA6 TCC 6CA 6AA GAC TTA ACT GAT GGT TCA TAT GAT GAT GH CTA AAT 6CT 6AA CAA 571 

LOKLCYLLESTEDPVIISRA 82 
Cn CAG AAA CTC CTT TAC CTG CTG GAG TCA ACQ GAG GAT CCT 6TA AH ATT GAA A6A GCT 631 

LITLGMMAAFSVMOAI rREtl02 
m ATT ACT TTG GGT AAC AAT GCA GCC HT TCA CTT AAC CAA GCT ATT ATT CCT GAA TT6 69! 

GGlPIVA»lCINHSNQStlCEKt22 
C6T GGT ATT CCA ATT CTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA 751 

ALNALNMLSVNVEN0I1C1ICV142 
GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAC ATA AA6 GT6 811 



OVLlCLLLNLSENI»AI9rEGLLl62 
CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA 6GA CTT CTC 

RAOVOSSFLSLYOSHVAICEI 182 
CGT GCC CAA CTG CAT TCA TCA TTC CTT TfQ CTT TAT GAC AGC CAC CTA GCA AAG GAG ATT 93T 

LLtVLTLFOHIKMCLICIEGH202 
CTT CTT CGA CTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TCC CTC AAA ATA GAA GGC CAT 991 

tAVQPTFTEGSLFFLLHGEE222 
TTA CCT GTG CAG CCT ACT TTC ACT CAA GCT TCA TTG TTT TTC CTC TTA CAT GCA GM CAA 1051 

CA0ICIRAIV0HHDAEVKEKV242 
TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA CAG CTC AAG GAA AAG CTT till 

V T I I P K t • 250 
CTA ACA ATA ATA CCC AAA ATC TGA "35 

rTCCTCATATTTTTCCAAAGAGTAATGCACTCTCGArArAAArcTATTTTCTCTCTTCCTTArAACCGCArTCTCCCAC 1214 

CTCCTAAATTfAAACAGTAAATATCACArTTTCTCATTAACACACCTATAACTTCCCGTCCTTCTCACATTTATTTTCC 1293 

ACrATTTTCATCCCAAGTGAATATAAGACCTTCTACrCAAACCArTTATncnTCTATTTTGCTATTTCCAAATCCTT 1372 

CTrArCTrCCCTACATGAACTCGCACTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTCAAGrCATTTCCAGTT 1451 

ACTCArCTGAGACAGCArCAGTATTrCACTAAATCATTCTTTCACAACTGAArACTCTTGTTCTTTrACTACCAArCAA 1530 

ArCCTAAGCTCTTCAGGCCATTCACCTCCCAACCrGACCATACTGCTTTCAAAAGTCrTTTCrCArCACTAGAATCTAT 1609 

rTrCGrCACrTCrACrCAATGAAAAArcrAAACTTrTACGACAGAATCTTTCCTACCACTCACCCACTCCArTCAArCT 1688 

rACATATAAAArAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAArAAATTATCTCGTCTTTCAAAAGA 1 767 

CCCrtXrCCCCCCCGTCCCTCrrGCCTGTAArCCCAGCACTTrGCGAGGCTGAGGCCGGCAGATCACCTGAGArCGC^ 1646 

CTTTGAGACCAAGCCrCACCAArATGGAGAAACCCTGrcrCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAr 1925 



FIG v-l- ^^'''^) 
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(XaCTAATCCCA(XTAaTCGGACCCCCAGCCA(»AGMTT(XnCAACCCCCW 2004 

ATACCGCCAniMCTCCAGCaCCCXAACAAGAGCAAAACTCTCTaC^^ 2083 

TCTCCTTAACT(XA>WCATATCTATGAAArATCCTCCTTTmAAAACACAAAAATTATAGA^ 2ld2 
TCTCTCTGTGTCTGTGTCTCTCrCTCTGTCTGTGTCrCTCTCTTTGAATaAAAAATGCTTATCrArrGACACAACACTT 2241 

CTAGAATCATACCCAAACTCCTGGACTCCCACTCCCGAATGCCTTCTACGTACACACTGTTCTACTCTTTCAAn^ 2320 

AArATCAGCCCAAATTCTATAATCTTTTnTAATAAACCGCAOAAAAATCAAA A A A A A AAAAAAA 2385 
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Input file ri81Atn«161a; Output File T1dUaMl8ta.pat 
Sequinctf length 3919 

cctOTCTGccsorncTAcian'TccAcccaagTTcc^ 79 

MAQIGAVVAVASSFFCAS t8 
ACTG ATC GCT CAG TTC 6GA GCT GTT GTG 6CC 6TG Ga TCC ACT TTC HT TCT GCA TCT 137 

tPSAVHClEEG HIGVYYilGG 58 
CTC nC TCA GCT GTG CAC AA6 ATA CAA GAG CGA CAT ATT GGA CTA TAT TAC AGA GCT GGT 197 

ALLTSTSGPGFHLHLPFITS 58 
6CC CTC CTG ACC TCC ACC ACT C6C CCG GGT HC CAT CTC ATG CTC CCG TTC ATC ACA TCC 237 

rKSVQTTLOTOEVlCIIVPCGT 78 
TAT AAG TCT CTA CAG ACC ACT CTC CAA ACT GAT GAA GTG AAG AAC GTA CCA TGT GGA ACC 317 

SGGVMIYFORIEVVNFLVPN 98 
AGT GGT GGT CTG ATG ATC TAC TTT GAC AGA ATT GAA CTG GTG AAC RC CTG GTC CCA AAT 377 

AVTDlVKIIYTAI»TDiCAl.lFK 118 
OCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCA GAC TAT GAC AAG GCC aC ATC TTC AAC 437 

KtNHElNQFCSVHTLQEVYI 138 
AAG ATC CAT CAT GAG CH AAC CAG TTC TGC AGC GTT CAT ACT CTT CAG GAA GTC TAT ATC 497 

ELFOOfOENllClALQQOLrs 158 
GAG CTG TTT GAT CAA ATT GAT GAA AAC CTC AAG TTG GCT TTG CAG CAG GAC CTG ACT TCC 557 

NAPGLVtOAVRVTICPMIPEA 178 
ATG GCC CCT GGG CTG GTT ATC CAA GCT 6TG CGA GTG ACA AAG CCC AAT ATA CCT GAG GCA 617 

IRftNYELNESEKTICLllAAO 198 
ATC CGC AGG AAC TAT GAG CTG ATG GAA AGC GAG AAC AC6 AAG CH CTC ATT GCA GCC CAG 677 

ICOKVVEICEAETERKlCALleA 218 
AAG CAG AAG GTG GTG GAA AAG GAC GCA GAA ACA GAG ACC AAG AAG GCC CTC ATT GAG GCA 737 

EKVAOVAEITYGQKVHEICET S8 
GAA AAA GTG GCA CAG GTT GCA GAA ATC ACC TAT CGG CAA AAG CTG ATG CAC AAG GAC ACA 797 

eiClCfSElEDAAFLAIIElCAKA 258 
GAG AAC AAG ATC TCA CAA ATT GAA GAT GCT GCG HC CTG GCC CGG GAG AAG CCG AAG GCC 837 

OAECYTALKIAEANKLKLTP 278 
GAC GCT GAG TGC TAC ACA GCG CTG AAG ATC CCA GAA GCA AAT AAG CTC AAG CTG ACT CCA 917 

EYLOLHKYKAIASiiSKfYFG 298 
GAA TAC CTG CAG CTC ATG AAG TAC AAG GCC ATT GCT TCC AAC AGC AAG ATT TAC TTC GCC 977 

ICDtPNNFHDSAGGlGICOFEC 318 
AAA GAC ATC CCC AAC ATG TTT ATG CAT TCC GCA GGG GGG CTC GGC AAG CAG TTT GAG GGG 1037 

LSOOKLGFGLEOEPLEAPTK 338 
CTG AGC GAC GAC AAG CTG GGC TTT CGC CTA GAA GAT GAG CCC CTC GAC GCA CCC ACA AAG 1097 

EM* 341 
GAG AAC TGA ^^06 

GGAAACACTCTCTCCAAGCTCTCCTC(»»X:AGCTTAGAGAGACCTGTATTCTTTAAGATGAGACAGACCAA^ 1185 

TCCTTTCCACACTACCTTCCTTCACTCTTCTTACTGTGGTTAAAAAGGAACAAATCGACACAAACTTACCCCCTTCTGC 1264 

GAACGGAGAGCAGATGGAGACTTGTTTrTTGCGTTrATTTTTAArTCAGGTAAGTAAGTTGTATCACTTCTGACAAGCT 1343 

GTATGCACCCrAGATTTGACCTCTGACCTGCAGACACCAACATrGTCACrrrGAAGCrGGTTTAAGTGGAGCTACTGTC 1422 

AGTArGAAGAGCGAGACTGTGTGCTGCCTCCTCGTGCTTCAATTCCTTCACGGAAAAGTGTACTCCACAGTTCTCrCCC 1501 

TTGCCTCTAGTCTAGCCAGTGTCTGCGTGTGCGCCTCGTGACAGAACCCCCTCTGCTGCGGAACATGAGCTGCAGAGAG 1560 

CCTrGGCCCGCTGCCCTTTTTGACTGAGTCCATTACTTCACAGrrAAGCTGrcrTGAGCCCTrTTTAGCAAGAACTTCG 1659 

TGCrAGGTrrrGCAAGGTrrTCTACACACTGTACTCrCCTCTAGTCTTTGTTGGCTACArCTCACCCCAGCAGCGCTTG 1738 
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CTaiCACCACACACTCCmTCCCTACm(UCCT«TCTCT«^ 1817 

CACTCMCGTTiUUUTGGGMCMACAAGTGCTGTTAGCrCMTGA 1896 

CTGT(MCTMTTATOTTATCCmTtSAi»(XAAACATCTmTCATTATGCAW^ 1975 

CT(«T(U>AGAAGCGCCCAGCCA1UTGACACCCAAGTA6TAGTGCCTGTGGCCT 2054 

AACMGAGCAGGCAGCCACTTGAGAGTCGGCTCCAGrCACTCACCCTAGGAAA 2133 

GAAACfiCAmCTTATCCTCAAATTGCACTCCCGCTCCCttTCTACCATCCCCTgTCACTCC l iC^ 2212 

G6AG6IXMXTCTGCA6GTAATCTGCIU»CAT6CCA6TACCCT6TI»AACCATGA^^ 2291 

nsnACCnmCCCCTACaCATCCTCCCCCCACACAAAGCACaACTGTTCTCTaTAGGTGACT 237D 

ArmCTGCCATCAATTCCCACCT«GmTCGTmGTAAfiTC6CCCCACTTTGCTCCTAACrcca 2449 

ACGTATTTGGCAAGCATTCACCCGACCCAAAAACAGCCAGGCnCACTCTGCTTAm Z52B 

TGACTCCTCAGCCCACTGACCCTCGCCACACTCTACAAACTACAAAATGTTCCTGAAAAGGA 2607 

AAGCTCTTCOUUUCTGCCTTmTTTCCCCAACACCAACTCArcncnCTCAmfin^ 2686 

CAiaAACCTCCTATACCCACCATCCTCTCnGTACSTCCACCTGACAAA AC »CrACnCAgl^ 2765 

GAGCGTACCCCGCCArCCAGCCCCCTCaAGCCCGAGAGGCTCn^CTAACTAGCAn 2644 

AAAQUiCCACAGTAAAGTCCTCCTGCAGCTGCTCCnCCGTCCCCCm 2923 

CCCCATGTCATAGCMTAAATTCAGTACCTATTGGTATCTaTCCCMSCAISTAAAA 3002 

CATCCCATGCCTAGCCCATCTGTCnTATGACCTTGmnTCTAATACTATAAAATCTGACTTACGCAmGM 3081 
AAACATGTAAAATGTGATAAGCCTGCAGTTTTGTAGGCACTGAATTCATAGCTCaAnmAAGTAGAACTTCTAT^ 3 160 

AAATACCTTAACCCmGTAAAATTCACrTTTTGTAGGACmCCCAACCCCCAOCCA^ 3239 

rCAaAAATGTTGCACAACCAAmATArTCCATArAGGrTmAATCACmT^ 3318 

CGAAGCCTAAGTnAATAATTmATATAACTAAAAATAGCTOTQUGGACT 3397 

rCCTCT«AAAGCCUCGTCTArAAACCCCCT0TC66GCCCrCTCTCnCTO>^ 3476 

GGGaCTCGrCACaCGCOTCCGTAAACTACCTGGACAArAGCCCCTCTCTCTGGGAAOT 3555 

rCAGTGGGCTmACCCACTQTTTGTTTCCrTATAAAACCTGTAArOCCCAArCATGronmACTTm^ 3634 

rAmCTACTTCTCTGTAAACTCCTGATTCMTACTTAAAGDWTTTTTTCAGTGTCCCCaUCCGM^ 3713 

TTATAACTCACAAArCATTaTCTTATACTAATTATTCCArAAATOATACCACTACATAAATTACaTCGCTTAA 3792 
rcCAGGATrTGTTTCAGACAACAAAAAAAGCTCTCAATGrGAArATACrTACArTTTCGATnAATTTCAGrCTTCCTA 3871 

AATAAAATGTTTTTGTCTTTTTTTGATTAACGTAAAAAAAAAAAAAAA 3919 
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iiput file TtaaaouM; Oiit|Biit FHe TiaZwute.pit 
Sequence tengtti 3087 

MMHTQAftL 8 
6(UUCCCCGa:TCCaM(MTCCCT(UCTGACCCGAG(MAm ATG AAT AT6 ACT CAA CCC CCG CTT 68 

LVAAVVGLVAILLYASIHiei 28 
CT6 6T6 GCT CCA GTG GT6 GG6 TTG GTG 6CG ATC CTC CTG TAC GCC TCC ATC CAC AAG ATC 128 

EEGIILAVYYRGGALLTSPSG 48 
GAA GAG GGA CAC TTG CCC GTG TAC TAC ACG GGA GGA GCT TTG CTA ACG AGC CCC ACT GGA 188 

pCTHlHLPFItTFRSVQTTt 68 
CCA G6C TAT CAT ATC ATG TTG CCT TTC ATT ACA ACA TTC AGA TCT GTG CA6 ACA ACA CTA 248 

OTOeVKIIVPCGTSQGVNtYl 88 
CAA ACG GAT GAA GTT AAA AAT GTG CCT TCT GGA ACA ACT 6GT GGA CTC ATG ATC TAT ATT 308 

DRIEVVMHIAPYAVFDXVAM 108 
GAC C6A ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TH GAC ATT GTG AGG AAC 368 

YTAOYDKTLtFNKIHNELNQ 128 
TAT ACT GCA GAC TAC GAC AAG ACT HA ATC TTC AAT AAA ATC CAC CAT GAC CTG AAC CAC 428 

fcSAHTlOEVYIELFOOIOC 148 
TTT TCC ACT GCC CAC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA 488 

NLKQALOlCOtllTNAPGLTIO 168 
AAC CTC AAG CAG CCC CTG CAA AAA CAT HA AAC ACC ATG GCC CCA GCT CTC ACT ATC CAG S48 

AVRVTKPKtPEAIffRNFEl.H 188 
GCT GTG CCT CTT ACA AAA CCC AAA ATC CCA GAA CCC ATA AGA AGA AAT m GAA HA ATG 608 

PAElCTIClLIAAQkQiCVVeXE 208 
GAG GCA GAG AAG Aa AAA CTT CTC ATA CCT GCA MG AAA CAA AAG GTG CTG GAC AAA GAA 668 

AETERKRAVIEAEKIAOVAr 228 
GCT CAC ACG CAG AGG AAA ACG CCT GTT ATA CAA GCA GAG AAG ATT GCA CAA CTA GCA AAA 728 

IRFOOieVHCIceTEKRISElE 2^8 
ATT CGA m CAA CAC AAA GTG ATC GAG AAA GAA ACT GAA AAA CCC ATT TCT GAG ATT GAA 788 

DAAFLAREKAICAOAEYyAAN 268 
GAT GCT GCG TTC CTG GCC CGA GAG AAG CCA AAA GCA GAT GCC GAG TAT TAC GCT CCA CAC 848 

ICYATSHlCIIICtTPEYLELICrY 208 
AAA TAC GCC ACC TCA AAC AAG CAC AAA CTG ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC 908 

OAIASMSKIYFGSIIIPSMFV 308 
CAC CCC ATT GCC TCA AAC ACT AAG ATC TAC TTT 6GC ACC AAC ATC CCC ACC ATC TTT GTG 968 

OSSCALKTSDCRTCREDSLP 328 
GAC TCC TCC TCT GCT CTG AAA TAC TCT GAT CCT Aim ACT CCG AGA GAA GAC TCC CTT CCC 1028 

PECAREPSCESPiaNKENAG 348 
CCA CAC CAC GCC CGT CAG CCC TCT GGA CAC ACC CCC ATC CAA AAC AAG GAG AAC GCA GCT Y088 

• 349 



TGCAACACGTCCAAArCTTCTCCCArArCAACATCCGACCCAAGGGCCTAACrCGGAACACTGGTTATCTGGACTCGTA 1 1 70 

AGATTCACAGAGAATGTGTGCTCTCTTGTGATrCTCTTGrCATAGTCCTGCTTTCCCACCrGACTACAGGATACACCCA 1249 

CCTGTCTCCCACTCAAACCCTCTCTCCACCCACAGTTTTATCAACTATCCTGTATCTGTTCCTTTCTAAACCGCTACTC 1328 

ATCAATCACCCAAACTCTCATCCTAACATACTGCCTCCACTCCAATCTCAAACACTATATAACAACCTCTCCTTTTTAA 1407 

AACCrATTGAArAATCTTTACATTCCrCCCTGAGGACATCTGTCCTCAGACATTCAACACCTACCACCCCAGAGAGAAG K86 

ACCrTCAGAAAACGCTAAGTrAAAGAAGACAACTGTCATCAGACACTTCGCACCCCGCCTCTCTTTAAACTCTAGTCCC 1565 

CGCATTCCrCCATGTGArTGACACCCAGACCTCTGGCTTCCCAGCAAATTATCTTCCACTTGAATGACCATTTAGTTCA 1644 



TACAAATTGrACCTTrCTCTTTTTCTACTCACGrTCCrCGCCrGCAGGCACeCCTACTrTCCCACCCCACCACACCrrC 1 723 
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CTCCMMUTAnCCCMTCACTACTnArTGC0TTA66MUCTGA6AfiATATAGMA^ 1802 

AAAGCCTIXACTCCACCAAACCTACG«TCCCTGTfSnTCCTCrAnCAGT(MT(^ 1861 

ATGT6TGAaAAAGTGCCCCGTnrAIXCACA(MCAACTGaTA(MTGTCACaaT^ 1960 

GCTTTAACCAGAaTAGOACCAGTGTCCAATTCCTGAnUCrGCACACTATTATGTCATAAn 2039 
TGTTTTTAAAACTCGATTTCCCGCACATTCATTCACCCCAACACTTCTATCTAAACCCCAAGCTTCTACGGCT6CTATC 2118 

GTCACTAACACACTGATTCTCCTTAAAGTAATTCTCGAAGT6T(UMACAAAGTGACCGA6ACAGM^ 2197 

nGTCTCCTTCCCTCimArGCAGATACCtUUGnCCrmCC/UCmCCCCTCCGCTA 2276 

GTGACnCCTGGG»GCCATTGAATTCAmTCCATGAGAAGATGACACAGTTACCXTGTCGCTATACGAGATa 2355 

ATCCAGACCmmCCCATCACATTAACTTTCCTGGAATATTGTGCTGCAlMGGTACACa^^ 2434 

TGACAGCTOTGTGTATACTGTGnGAACCCAGACAGAAAAGTAATGGGGCCACnCTGAAACCTCT^ 2S13 

TCACAGCACaAAAGOSnGTCCCAAACAnTTATTAAGAAAGTAAAGCCCAGAmGAAT 2592 

TTATAGTATAGAGGCATnCTAATATCGACAAAATAATmTCTCAmAATTATAGAAATTACCTTDUAC^ 2671 

CTCTTCTTTCGCCCTTCAAATACTCGTGTTACATTGTTGCTGCAGATAAATGATGAnGTCGTGGCATAT^ 2750 

TGAGCTCTCTGCTnCAnCCTACAGArGTTTCTanCCCAmAGTGAAATGCTGnGCOCCAAAGm^ 2829 

GCATTTCTTACCGCTaiTAGGCCCCGCTCACCAIXA C GCAACCCCCAnGTCAAAG^ 2908 
AGCTCCnArCGAGTGAGCTTCCCTGTGCCCACTCAGTGAAaAAGTCTGACCATCCTTCAGGGACGTTCaTTTGGTA 2987 

AATATACACTGTAATCTTTAAGTCTAAAmATATGTGAAACTTAACTTTmTAAAAACCTAAATAAAATTAm 3066 

TATCAAAAAAAAAAAAAAAAA 3087 
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If^ file Tia7A>me0649t1; Output PUe Tlfl7Aymj«064g11.pat 
Sequtnea Icngtn 2883 

GTCCAC«AA AA CCTCCncCAaAiaa»CAimTO 79 
TCCGATrnJUXAGGGOWTTCCGISAACCCOMX 1S8 

N G G A ft 0 6 

CACCTCmCTCGCCCTGCCTCCTCGGCTCCaCCAGCrCCGAGGCAGCAGC ATG 66T G6C GCG C66 GAC 228 

VGUVAAGLVLGAGACYClYft 26 
GTG GGC TGG GTG CCA GCA GGG CTG GTC CTG GGC CCC G6C GCC TGC TAG TGT ATC TAC CGG 288 

LTftGPRftGGRRLRPSftSAEO i6 

CTG ACT CGG GGA CCG CGG CGA GGC GGT CCC CGA CTG CGC Ca TCG CGA TCC GCA GAA GAC 348 

LrOGSYOOItttAEOLKKllY 66 
CTA ACC GAT GCC TCC TAT GAC GAT ATC TTA AAT GCA GAG CAG CTT AAG AAA CTT CTG TAT 408 

LLESTOOPVITECALVTLOM 86 
CTG CTG GAG TCA ACC GAC GAT CCT GTC AH ACT GM AA6 GCC TTG GTC ACC TTG GGA AAT 468 

NAAFSTMOA I |RELGGrPtV106 
AAT CCA GCC TTC TCC ACT AAC CAG CCC ATT ATT CGT GAG TTG GGT GGT ATC CCA ATT GH 528 

GNKtNSLM0StieElCAlMALNl26 
GGA AAC AAA ATC AAC TCC CTG AAC CM ACT ATT AAA GAG AAA CCT TTA AAT GCA CTG AAT 588 

MLSVMVEIf0rietKIYVPOVC146 
AAC CTG AGT GTG AAT GTT GAA AAT CAA ACT AAG ATA AAG ATA TAC CTC CCT CAA GTC TGT 648 

EOVFA0PLHSAVOLA6LRLL166 
GAG GAC GTC TTT GCT GAC CCC CTG AAC TCT GCG GTG CAG CTG GCC GGA CTG AGG CTG CTG 708 

TMHTVTII0YQHLLSGSVAQL188 
ACA AAC ATG ACC GTC ACC AAC GAC TAT CAG CAC CTG CTC AGC GGC TCC GTC GCT GGC CTG 768 

FHLLLLGNG$T1CVOVLKILL206 
TTC CAC CTG CTG CTG CTG GGA AAC GCA AGC ACC AAG GTC CAG GH HQ AAG CTG CTT TTG 828 

NtSEN$ANTE6lLSVOVSRL226 
AAT no TCT GAG AAT TCA CCC ATG ACA GAA GCA CTA CTG AGT CTC CAA CTA ACT AGA TTA 888 

PTRFISAHIORF* 239 
CCT ACC CCG TTC ATT ACT GCA CAC ATA CAG AGA TTT TGA 927 

CAAATAGATCTGCAAAGGTATGCCCAAAAACATTCACACGAATTATTTCTGAAGATCAGrATTAAGCATArrTTGrTTT 1006 

TTAAAACTTCTCTGTCCCACCAGCACACTTTCCATCTCTCGCCACTTTCCAGTATTTTTCT6TCACTGCATTTTAAACT 1085 

TTGTTTTTTTTGTGCATGTGTACCTCAGCATTTCCTCAAACAACTCTACTGACTGAGTCCCCTGTCTGCGCTCCCTCCT 1 164 

GAGCATTCAGCCAGCACCACCAACTTCTTAGTCTTCCCATGGAACTTACCAGAAGCAA C CATGTAACAAATTACCAACA 1243 

CTGTTGAAAACATGTAACAAACCATTGAAAOUiTCCCTGTGCTCTGAAGAAGGCCAGGCGGTGTGAGCCCTCTGGACAA 1322 

ArCGAGCCATCTGCTCCGTCCTGTTACCAGAACTCTGTGTAAGAfiCTAArCCTGATTGAACTAATCTCTTCnACAAAA 1401 

ACTOUrAGArCCrAAACGGGTTCGmcCCAAATGCCTACACTCTGGMTTCeMAGAAATCTTAGTT^^ 1480 

CAAAACGTCATTTTCACTTGTAACArGGAATAAAAATGAAACATGTCCCnACGCTTGCCTGGAGTCAGACTTnACAG 15S9 

TGTTAACTAATCGATCCTGTTTrAAAATAGGACAGrGACCCTGTTTCCTCTTTCACCrGCArTCTTCATTCCTTTCCCT 1638 



rrArGACGCCCAACTAGCAAATGAGArrcrrCTrCGGCCrcrTACACTCTTTCACAATArAAACAACrGCCTCAAACTC 17^17 
GAACCCCCCrTACCTAArCAGArrCCTTrTCCTAAACGGrCArTCrTTTTTCTGrTATACCCAGAACAATCrCCCCAGA 1796 

AAArGAGACCrrrAGCCTGTCATCATGATGTGGATGTGAAACAGAAAGCTTTAGCAATAAAGCCGAAATTCTGATCCGr 1875 

TGCTCCTATTTTTATCAAAGACTCAAACAGTAACGCAGTCTTAAGTCAGCACACGGGAGCCTTTGCCTGCCTTTAAAAG 19S4 

GCCrCTTrCAGCCCArGGAGTTAAACAATAAAAGrGAGTGAGCAGCTCrAATCCAACACQITGTTCAAAATTTTAGATT 2033 

TTGGAGTAGTTCAGArTTGCGGTTTGGCCArTGAGrAGAGTCTGGAACCTTCCCAGCATGTCCATCATTTACGCCGCAA 2112 



r(6 ^1 I 1^^^) 
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AC0m(OT7AT(MTC6T(»MCA(UaGGCCATl»:TCnCAaSACTAmOUm 2191 



aiUCTCCTCW^TTTGC/Um'GCTCATGACACACCCnTAAGrGCTGA^ ZS49 

ACmGTGCCTC/UUCCAGTtMATACTGCMGaCGAGTCCACCACCMCCCTGC^ 2428 

ATCCTGAGACACTCCCTCCACCAmCTGATaCTAOWaGTACTCCaiTmCATCtUAA^ 2507 

ACCCCmOTGTAAGATACTGCAGAGaCTCCMCCTTCCACCCACAC^ 2506 

TAAOTCCAGATGCUTAMTCCMGMAaTACIXATGAGATGGCTGCTTTGAAAGCATGCT 266S 

TCCCCTCTCT TTT T r nCT CICACTAATCATAAArACACTTATACATGC A aCAACAnTCTA^^ 2744 

TATAAmmAATAAAAAiaUAAAATGCAACCTCTACATAAAAAAAAMAAA/^^ 2883 



GAQCQOCTQTACTCMGATACTTCKTSAGGTAmAArGGmcCTlMCACCA^ 
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tf^ ffle T21SAtfl0C2l5; Output FfU T21SAtnK215.pat 
Sequtnee length 27U 

HeLOSUAQLGlV 12 
CTCCOTACCGACACACCAACGGGAAACG ATG CAG CTA GAC ACA T6C GCC CAG TTG GGG CTG CTG 64 

fUQLLLtSSLPR6YTViN6A 32 
TTC CTG CAG CTC CTT aC ATC TCA TCG TTG CCA AGA GAG TAC AC6 GTC ATT MT GAA GCC 124 

CPGAEUNINCRECCEYOQIE S2 
TQT CCC 6CA GCT GAG TGC AAC ATC ATG TGT ACA GAA TCT TGT GAA TAT CAT CAG ATT GAA 184 

CLCPGICXEVVGYTtPCCRNE 72 
TGC CTC TGC CCA GGA AA6 AAG GAA GTG GTG CGT TAC ACC ATC CCA TGC TGC ACG AAT GAG 244 

0N6CDSC11HPGCT1FEMCJC 92 
GAT AAT GAA TGT GAC TCC TGT CTA ATT CAC CCA GGT TGT ACC ATC TTT GAA AAC TGC AAG 304 

SCRMGSUGGTLDOFYVICGFY 112 
AGC TGC C6C AAT GGC TCC TG6 GCC GGA ACT CTG GAT GAC HC TAC GTG AAG GGA HC TAC 364 

CACCRAGWTGGDCIIRCGQVL 132 
TGC GCA GAG TGC AGG GCA GGC T66 TAC GGA GGA GAC TGC ATG C6A TGT GGC CAG GTT CH 424 

RASKGQILLESYPlNAHCev 152 
CGA GCC TCA AAC GGT CAG ATC TTG TTC GAG ACC TAT CCC TTA AAC GCT CAC TGT GAA TGG 464 

TINARPGFirOLRFQNLSLE 172 
ACT AH CAT GCC AGA CCT GGG TTT ATC ATC CAS TTG AGO TTT GGT ATG CTG ACC CTA GAG SU 

FDYNCOYDYVEVROGPMSDS 192 
TTT GAC TAC ATG TGC CAA TAT GAC TAT GTG GAG GTC CCC GAT GGG GAT AAT ACT GAC AGC 604 

PIIICRFCGNERPAPtR$TQS 212 
CCT ATC ATC AAG CCT TTC TGT GGC AAC GAG AGO CCA GCT CCC ATC AGG AGC ACT GGC TCT 664 

SLHVLFHSDGSKNFDGFNAV 232 
TCA CTC CAT GTC CH TTC CAT TCT GAT GCC TCC AAG AAC TTC GAT GGC TTC CAC GCT CTC 724 

FEEITACSSSPCFNOGTCIU 252 
TH GAG GAG ATC ACA CCC TGC TCC TCA TCC CCT TQT TTC CAT GAT CCC ACA TCC CTC CTT 784 

OTTGSFKCACLACYTGORCE 272 
GAC ACC ACT GCC TCT TTC AAG TCT GCC TCC CTG GCT GGC TAC ACT CCG CAQ CGC TGT GAA 844 

HLLEERMCSOLGQPVNGYKIC 292 
AAT CTA CTT GAA GAA AGA AAC TGC TCA GAC CTT GGG GGG CCA CTC AAT GGG TAC AAC AAA 904 

ITECPGLLNERNVICtGTVVS 312 
ATC ACA GAA CCT CCT CGA Cn CTC AAT GAG CGC CAT CTA AAA ATT GGC ACC GTT GTG TCT 964 

FFC NGSYVISCIIEXRTCOOII 332 
TTC TTT TGT AAC GGC TCA TAC CTT CTG ACT GCC AAT GAC AAA CGA ACT TGC CAG CAG AAT 1024 

GEUSCKQPVCNICACREPKIS 352 
CCA GAG TCG TCA CGA AAG CAA CCT CTC TGC ATG AAA GCC TGC CCC GAA CCC AAG ATC TCA 1084 

OLVRRRVLSMOVQSRETPLK 372 
GAC CTG CTG ACA ACC AGA GTC CTT TCC ATG CAC GTT CAG TCA ACC GAC ACA CCA TTA CAT 1144 

QLYSTAFSKQICLaOASTKKP 392 
CAG CTT TAT TCC ACG GCT TTC AGC AAC GAG AAA TTG CAG GAT CCC TCT ACC AAA AAG CCA 1204 

ALPFGDIPPCYOHLNTQVQY 412 
CCC CTT CCA TTT CGA GAC CTG CCC CCT GGA TAC CAA CAT CTG CAC ACC CAA GTC CAC TAT 1264 

EClSPFTRRlGSSRRTCCRT 432 
GAC TCC ATC TCG CCC TTC TAC CGC CGC CTC GCA ACC ACC ACC ACC ACA TCC CTG AGA ACT 1324 

CKUSGRAPSCIPIC6ICIEST 452 
CCC AAC TCG ACT GGG CCG CCC CCG TCC TCT ATC CCA ATC TGT GCA AAA ATC GAG ACC ACT 1384 

PSPKTOCTRUPWQAAIYRRT 472 
CCT TCT CCA AAG ACC CAA CCC ACC CCC TGC CCA TCC CAC GCA CCC ATC TAC CCG AGC ACC 1444 



wo 00/18904 



PCTAJS99/22817 



112/112 

SQVHD66LHIC6AUFLVC96A 492 
A6T G6T 6TA CAC GAT G6T GCST CT6 CAC AAA G6T 6CA TGG TTC HG GTC T6C ACT GOT CCC 1504 

tvMERTVVVAAHCVTELGlCA 512 
aC GTG AAT GAA CG6 ACT GT6 GTT GTG GCT 6CC CAC TCT GTO ACT GAG CTG GG6 AAG GCC 1564 

TiiiCTADllCVVlGlCFrilOOD 532 
ACC ATC ATC AAG ACA 6CA GAC CtC AAG GTT 6TC TT6 GGA AAA TTC TAG AGG GAC GAT GAT 1624 

RDEKSIQNIRVSAIILHPMY 552 
C6G GAT GAG AAG AGC ATC CAG AAT HA CG6 GH TOT GCT ATC ATT CTG CAC CCC AAC TAT 1684 

OPtLlOTOIAVlKLLDXAIII 572 
GAC CCT ATC CTG CTT GAC ACT GAC ATC GCT GTT CTG AAG CTC CTA GAC AAA GCT CGC ATC 1744 

STRVOPICLATTROLSTSFO 592 
ACT ACC CCT GTC CAA CCC ATC TGC CTG GCT ACC ACT CGG GAC CTC AGC ACC TCT HC CAC 1804 

ESHtTVAGUHIlAOVRSPGF 612 
GAA TCC CAC ATC MX GTG GCT GGC TG6 AAC ATC aG GCA GAT GTG AGG AGC CCT G6C TTT 1864 

ICN0TLHrGNVRVV0PNie6e 632 
AAG AAT GAT ACC TTA CAT TAT GGA ATG GTC AGA GTG GTA GAC CCA ATG CH TQT GAG GAA 1924 

OHGOHGIPVSVTDNNFCASK 652 
CAG CAT GAA GAC CAT CGC AH CCA GTT AGT GTC ACT GAC AAC ATG UC TGT GCC AGC AAA 1984 

OPSTPSDICTAETGGIAALS 672 
GAT CCC AGT ACC CCT TCT GAC ATC TGC ACT GCA GAG ACA CGG GGC ATC GCT GCT TT6 TCC 2044 

FPGRASPEPRVNLVGLVSWS 692 
nC CCA GGC CGA GCA TCC CCC GAG CCA CCC TGG CAT TT6 GTG GG6 CTG CTC AGC TGG AGC 2104 

Y0ICTCSMGL5TA PTICVLPFIC 712 
TAT GAC AAG ACA TGT AGC AAT GGC CTA TCC ACA GCC TTC ACA AAG CTG HG CC6 TTC AAA 2164 

OUlERMNie* 
GAC TGG AH GAG AGA AAC ATG AAA TGA 

ACCACCCACAACGCCACTGAGAAGCCTTnCCTAGCATCCGTCTCTACATArCTTGTATAGAACAATCCGGCCCTGMC 227D 

TGTAATTTTCCCCACCATCTTGGCTACTGAAAGCCTCCTGCTTTCAGGCACTTATCTCAATAGAGGGTGAACAGACTTT 2349 

ACTTCATCAGGGAACTCTCTCCCTGACTGCTTGGGAATCATCTAAMGATGCCMCTCTTGCAACMCTGGATnCTTC 2428 

AAAGMGACCATCTGACTAGAAGGAGMCCTCTTGCTCCTGCTCCACTCAGACTGATCTGACTGTCAATCAGnTGGGT 2507 

TGAGAAGCTTGAnT(a»GA66CCTG6GCTGCACCTCGCTTCTQTCAAAGnCCAAAGAAOUACAACTTAGACTACCC 2586 

CAGGGCAAAGGAGATTCGGTGTGGCACCCTCTGTAAATTCTCACAAGATTGTCTGATCCTTTCCCnTCCAATCTTCTG 2465 

TACACATTTCAATAAAACAAGCTCTGCTCCCTGACCTACCAAACAAAAAAAAAAAAAA A AAAA A AAA A AAAAAAAAAAA 2744 

TACACAmCAATAAAACAAGGTCTGCTCCCTCACCTACCAAACAA A A A A A AA A AAAAAAAAAA A AA A AAAAAAAAAAA 2744 



721 
2191 



